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Preface 


This volume presents selected papers from the twelfth International Conference 
on Spectral and High-Order Methods (ICOSAHOM'18) that was held in London, 
United Kingdom, during the week of July 9—13th, 2018. These selected papers were 
refereed by members of the scientific committee of ICOSAHOM, as well as by other 
leading scientists. 

The first ICOSAHOM conference was held in Como, Italy, in 1989 and marked 
the beginning of an international conference series in Montpellier, France (1992); 
Houston, TX, USA (1995); Tel Aviv, Israel (1998); Uppsala, Sweden (2001); 
Providence, RI, USA (2004); Beijing, China (2007); Trondheim, Norway (2009); 
Gammarth, Tunisia (2012); Salt Lake City, USA (2014); and Rio de Janeiro, Brazil 
(2016). 

ICOSAHOM has established itself as the main meeting place for researchers 
with interests in the theoretical, applied, and computational aspects of high-order 
methods for the numerical solution of partial differential equations. 

With over 360 attendees, ICOSAHOM ’18 has been the largest edition of the 
conference series to date. The program consisted of eight invited speakers across 
the week from internationally renowned researchers, alongside 40 minisymposia 
(of around 300 presentations) dedicated to specialized topics in high-order methods, 
and approximately a further 90 contributed talks. 

The content of these proceedings is organized as follows. First, contributions 
from the invited speakers are included. The remainder of the volume consists of 
refereed selected papers highlighting the broad spectrum of topics presented at 
ICOSAHOM '18. 

The success of ICOSAHOM '18 was ensured through generous contributions 
and financial support of our sponsors: the Air Force Office of Scientific Research 
(AFSOR); the Platform for Research in Simulation Methods (PRISM) platform 
grant, funded by the Engineering and Physical Sciences Research Council (EPSRC); 
Rolls-Royce Ltd.; and, finally, the Department of Aeronautics at Imperial College 
London. 

We would like to give special thanks to our local organizing committee for 
their efforts in organizing and promoting the event. In particular, we would also 


У 
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like to thank Mr. Andrea Cassinelli for his organizational efforts leading up to the 
conference, as well as the administrative staff of the Department of Aeronautics at 
Imperial College London for their help in coordinating the logistics of the event. We 
also thank the many student helpers for their advice, help, and support given to the 
delegates during the event itself, who all contributed to the smooth running of the 
event. 


London, UK Spencer J. Sherwin 
Exeter, UK David Moxey 
London, UK Joaquim Peiró 
London, UK Peter E. Vincent 


Zürich, Switzerland Christoph Schwab 
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Part I 
Invited Papers 


Stability of Wall Boundary Condition (8) 
Procedures for Discontinuous Galerkin ш 
Spectral Element Approximations 

of the Compressible Euler Equations 


Florian J. Hindenlang, Gregor J. Gassner, and David A. Kopriva 


1 Introduction 


The ingredients for a reliable numerical method for the approximation of partial 
differential equations, e.g. one that will not blow up, include stable inter-element and 
physical boundary condition implementations. The recognition that the discontinu- 
ous Galerkin spectral element method (DGSEM) with Gauss-Lobatto quadratures 
satisfies a summation-by-parts (SBP) operators [4, 7] has allowed for the analysis 
of these schemes and to connect them with penalty collocation and SBP finite 
difference schemes. For instance, in [5], we showed that a split form approximation 
of the compressible Navier-Stokes equations was both linearly and entropy stable 
provided that the boundary conditions were properly imposed. 

The importance of stable boundary condition procedures for hyperbolic equa- 
tions has long been studied, especially in relation to finite difference methods, 
e.g. [3, 9, 10]. Only recently have they been studied for discontinuous Galerkin 
approximations. In [12], the authors showed that the reflection approach is stable 
when using an entropy conserving flux and an additional entropy stable dissipation 
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term (EC-ES). In [2], the authors show that the reflection condition is stable if the 
numerical flux is either the Godunov or HLL flux. 

In this paper, we analyze both the linear and entropy stability of two types 
of commonly used wall boundary condition procedures used with the DGSEM 
applied to the compressible Euler equations. In both cases, wall boundary conditions 
are implemented through a numerical flux. The boundary condition might be 
implemented through a special wall numerical flux that includes the boundary 
condition, or a fictitious external state applied to a Riemann solver approximation. 
We show how to construct special wall numerical fluxes that are stable, and study 
the behavior of the approximations. In particular, we show that the use of Riemann 
solvers at the boundaries introduce numerical dissipation in an amount that depends 
on the size of the normal Mach number at the wall. 


2 The Compressible Euler Equations and the Wall Boundary 
Condition 


We write the Euler equations as 


ш+) 2 =0. (1) 


The state vector contains the conservative variables 


> T T 
u—[o ot Е] =[ evi ev? ovs EJ . (2) 
In standard form, the components of the advective fluxes are 
QUi 0% 0v3 
Qvi + p QU» VI оозу 

fi Е Оу] 22 Ь == Qv? + р f; = QU3 v2 , (3) 

QUI U3 QU? v3 ov; +p 

(E + р) (E + p)v2 (E + p)v3 


Here, о, Ù = (v1, v2, v3)? , р, Е are the mass density, fluid velocities, pressure and 
total energy. We close the system with the ideal gas assumption, which relates the 
total energy and pressure 


1. 4252 
p- 6 - p(£- sell ). (4) 
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where y denotes the adiabatic coefficient. For a compact notation that simplifies the 
analysis, we define block vectors (with the double arrow) 


f=[f b f], (5) 


so that the system of equations can be written in the compact form 


о 


ш Ӯ, : = 0. (6) 


The linear Euler equations аге derived Бу linearizing about a constant mean state 


(Q, V1, V2, 13, p). We follow [11] for the symmetrization of the linearized equations, 
with the constants 


| (7) 


where c is the sound speed of the constant mean state. The state variables become 


T 
u-[o vi v vs p]. (8) 
where v is the velocity perturbation from the mean state, and we introduce 


der, Ире (9) 
0’ Qa уут 


which depend on the density and pressure perturbations 0, p. The flux vectors аге 


& = Au, f-Au- (Ait + Aĵ Аз), (10) 

where [11] 
àjb 000 2 000 30050 
bi, 00а 0% 000 05 0 0 0 
А=| 0000 |, A52|b0i0a |, а= [| 0 0200 
0 50 000750 ь 0 0%за 
0a00% 004 0% 000a 
(11) 


are constant symmetric matrices. 
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The linear equations have the property that the L? norm of the solution over a 
domain © is bounded by terms of the boundary data on 952, only. Let 


3 
(у, ж) = [ v! w dxd ydz, (f. ë) = [ Y gi dxdydz. (12) 
Q g i=! 
represent the L? inner product of two state vectors v and w and two block vectors f 


and g, respectively. Since the coefficient matrices are constant the product rule and 
symmetry of А implies 


(v. y. u) = (s. ; (Au) D = (улы, t). (13) 


Then it follows from Gauss' law (integration by parts) that 
5 а 1 NR 
(v. T u) = | w[f.zas, (14) 
2 Jan 


where й is the outward normal to the surface of Q. The norm of the solution 
therefore satisfies 


d "" 
Gj lull? = -f ulf- ñds. (15) 
dQ 


Replacing the boundary terms by boundary conditions leads to a bound on the 
solution in terms of the boundary data. The argument of the boundary integral on 
the right of (15) is 


аў. = и! (А.Я) ш=2 (орар) и + 6-Я) ©? p. 10 


where v; is the wall normal velocity, v, = v - n. Note that here, the mean flow must 
be chosen such that the normal flow vanishes at the wall boundary DA= 0, so that 
the boundary condition makes physical sense. 

Therefore, with the no penetration wall condition v, = 0 applied, 


d 2 
3; ull = 0, (17) 


and the (energy) norm of the solution is bounded for all time by its initial value. 
The nonlinear equations, on the other hand, satisfy a bound on the entropy that 

depends only on the boundary data. For what follows, we assume that the solution 

is smooth so that we don't have to consider entropy generated at shock waves. We 


Stability of Wall Boundary Condition Procedures 7 


introduce the entropy density (scaled with (y — 1) for convenience) as 


__ BS 
Е s (18) 


where с = In(p) — y In(o) is the physical entropy. (The minus sign is conventional 
in the theory of hyperbolic conservation laws to ensure a decreasing entropy 
function.) The entropy flux for the Euler equations is 


RE ЗС 19 
Л”) = 05 Gat (19) 


Finally the entropy variables are 


yee — ВР, 
д 1 


The entropy pair contracts the solution and fluxes, meaning that it satisfies the 
relations 


as VT "E" 
wl и, = (=) u =s), w V,- f-V,. fE. (21) 
u 


When we multiply (6) with the entropy variables and integrate over the domain, 
(wan), ur) + (wn. V. 9 —0. (22) 


Next we use the properties of the entropy pair to contract (22) апа use integration 
by parts to get 


а) = - (S. 75.1) e f (75-3) dS (23) 


dQ 


showing that, in the continuous case, the total entropy in the domain can only change 
via the boundary conditions. 

In the case of a zero-mass flux boundary condition, with v, = 0 -n = 0, the 
entropy is not changed by the slip-wall boundary condition, since 


mU ID RM ee aO. (24) 


(у—1) 
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3 Stability Bounds for the DGSEM 


The DGSEM is described in detail in [5] and elsewhere [1, 6]. We will only 
quickly summarize the approximation here. The domain, 52 is subdivided into 
non-overlapping, conforming, hexahedral elements. Each element is mapped to 
the reference element E — [—1, 1]?. Associated with the transformation from the 
reference element is a set of contravariant coordinate vectors, a! ,and transformation 
Jacobian, J. Equation (6) transform to another conservation law on the reference 
element as 


ju; + V; -f= 0, (25) 


where f is the contravariant flux vector with components f = 70. f. 

The approximation of (25) proceeds as follows: A weak form is created by taking 
the inner product of the equation with a test function. The Gauss law is applied to 
the divergence term to separate the boundary from the interior contributions. The 
resulting weak form is then approximated: The solution vector is approximated by a 
polynomial of degree N interpolated at the Legendre—Gauss—Lobatto points. In the 
following, we will represent the true continuous solutions by lower case letter. Upper 
case letters will denote their polynomial approximations, except for the density, 
where the approximation is denoted by p. The volume fluxes are replaced by two- 
point numerical fluxes. In the linear case, the two point fluxes are immediately 
relatable to a split form of the equations. Integrals are replaced by Legendre-Gauss- 
Lobatto quadratures. Finally, the boundary fluxes are replaced by a numerical flux. 
See [5] and [8] for details. 

The result is an approximation that is energy stable for the linearized equations if 
at every quadrature point along a physical boundary the numerical flux F* satisfies 
the bound [5] 


x 1* 
"| — 5.4) > 0, (26) 


where F is the polynomial interpolation of the contravariant flux from the interior, ñ 
is the reference space outward normal direction, and U is the approximation of the 
state vector. Since the contravariant fluxes are proportional to the normal fluxes [6], 
we can change the condition (26) to 


үт) ж le , 
By =U jF и > 0, (27) 
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For entropy stability of the nonlinear equations, the boundary stability condition 
shown in [5] is proportional to 


By, = WT (к(а) + (251) > 0, (28) 


where F 5 is the polynomial interpolation of the entropy flux, f 5, and W is the 
interpolation of the entropy variables. 


3.1 Linear Stability of Wall Boundary Condition 
Approximations 


To find linearly stable implementations of the wall condition vn = 0, one needs 
only find a numerical flux that satisfies it and the condition (27). For the linear 
equations, the approximation of the state vector is U = [р’У Р”]Т and the normal 
contravariant flux is proportional to 


F- =A-iU=[bVn mQ mO mQ aV]. (29) 


where V, is the approximation of the normal velocity at the wall computed from the 
interior, Q = bp! + a P', and (пі, n1, пз) are the three components of the physical 
space normal vector, n. The numerical flux can be expressed as 


F* = A -n U* = [ру mQ* n3Q* n3Q* aV;] . (30) 


It then remains only to find Q* so that (27) is satisfied when the normal wall 
condition V7 = 0 is applied. When we substitute the fluxes (29) and (30) into 
(27), 


Bi = ; [O (2V — Va) + Va (20° - 9)] = 5 |207: 2v. (0° - 0) 
(31) 


Substituting the wall boundary condition У* = 0 yields the condition on Q* for 
stability 


Vn (Q* — О) > 0. (32) 


Neutral stability is thus ensured if o* and P* are computed from the interior, i.e. 
p* = p, Р* = Р! so that Q* = О. 

In practice, the boundary condition is also implemented through the use of 
a Riemann solver and external state designed to imply the physical boundary 
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condition to construct the numerical boundary flux. The exact upwind (e — 1) 
normal Riemann flux and the central flux (= = 0) for the linear system of equations 


F* (U, Dp = > [Е (U) .n n F (О®) . i i| – = |А, (ust — U) | (33) 


where А, = Á . i is the normal coefficient matrix. The external state is set by using 
the interior values of the density and pressure and the negative of the value of the 
normal velocity, 


Ux = L (¥ - 2v) Zl (34) 


For = = 0, using the central (averaged) numerical flux, the interior flux 
contribution cancels and condition (27) reduces to 


О nıb nob n3b 0 р’ 
ть 0 0 О ma 


1 E 5 
BL o = ОТА О" = L ўр || ьо о о ља | | Y 2v, 
n3b 0 0 0 ma 
О nja noa naa 0 Р! 
= Q(-V.n) + (V.n) Q =0, (35) 


which is neutrally stable, having no additional stabilizing dissipation. We note again, 
that the mean state for the linearization is chosen such that the normal mean velocity 
components are Zero, resulting in the zeros on the diagonal of A „. 

Substituting the exact upwind flux where ¢ = 1 into (27) and rearranging, 


Bj; = -U7 lA; 


+ 50 |А, | U, (36) 


where A, — 5 (A a= |А al) is negative semidefinite. The second term is non- 


negative, depends only on the interior state, and adds stabilizing dissipation. From 
the matrix absolute value, the dissipation term is 


О” |A,|U= = 20° + Ma, (37) 


where Ma, = V,/c is the normal Mach number. Stability depends, then, on the 
value of the first term, which is where the boundary conditions are incorporated 
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through the external state U*** written in (34). Then 


ext 1 2 p 2 
U^ = — 9? - Ма. (38) 


Т ES 
о; 3 


Therefore, using the upwind numerical flux, (36) becomes 
Bri = Ma? > 0, (39) 


as required. The amount of dissipation depends on how far the interior computed 
normal velocity deviates from zero. 

The combination of the reflective state and local Lax-Friedrichs flux is also 
linearly stable. In that case the exact matrix absolute value is replaced by a diagonal 
matrix, |A,, | А |Al max I. The jump term is added to the central (averaged) flux so 


Abas 
BLLF = — nex yr (U** — U) = Pp, Ma? > 0 (40) 


Finally, a dissipative version of the direct numerical flux (30) can be formed by 
looking at the reflective state approach. For instance, the equivalent to using the 
Lax-Friedrichs flux is to choose p* = р’ and 


a 
C 
P* = P' + 7 Аах Мап. (41) 


Then Q* = Q + @|AlmaxMan and 
Vn (Q* — О) = & ga Ma; > 0. (42) 


A similar, though more complicated, modified P* can be made to be equivalent to 
the exact upwind flux. 


3.2 Entropy Stability of Wall Boundary Condition 
Approximations 


As in the linear approximation, the wall boundary condition can be imposed for the 
nonlinear equations either by directly specifying the numerical flux or by computing 
it through a Riemann solver using a reflection external state that enforces the normal 
wall condition implicitly. Note that in this section, the discrete variables (р, V, P) 
describe the full nonlinear state. 
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For the nonlinear equations, we construct the numerical flux for a slip-wall as 
< ж 0 
(F-a) =| P*ii (43) 
0 
where we imposed V,, = 0 leading to a flux with no mass or energy transfer, and we 
introduce a wall pressure P*, whose value will be chosen to ensure consistency and 


stability. 
After some manipulations, the discrete entropy stability condition (28) becomes 


5 212 
-e ( - вій) 


(y — 
"ЖЕК: _ PSVn _ 
428v, (P* — P — РИ?) +287, (PE + P) а 
у v2 * = viz) — 
-ova (2 ИУ! )eae (P* + pE - о\ў\?) = (44) 
e ( Y l 542 * P l ws ) 
- P + =pllV |7 + P* + - оу | = 
P =i ;^ll | GD 50| | 
P* 
PVn ud > 0 
Therefore if we choose Р* = Р, to be the internal pressure, the boundary flux 


does not contribute to the total entropy, independent of the inner normal velocity 
Ул. А value of P* that leads to a dissipative boundary condition can be found either 
through exact solution of the Riemann problem at the boundary, or through the use 
of an external state and an approximate Riemann solver. 


3.2.1 Exact Solution of the Riemann Problem 


In [14] a symmetric 1D Riemann problem is exactly solved following Toro [13], to 
get the wall pressure P*, accounting for the fact that V, never vanishes discretely 
and therefore the wall pressure should be different from the interior pressure. The 
exact solution of the 1D Riemann problem reads as 


2 
(=) 1 4- y Ma, (Homa, + (Man) + ) > 1 for V,>0 
КР 


Р AY 
(1 +i- 1)Man) 7 <1 fo V,<0 


(45) 


with the normal Mach number, Ma, = Yn, and the sound speed с = ,/y Г. 
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As shown by Toro [13], the solution for the rarefaction has a limiting vacuum 
solution for Ma, < —2(y — 1)7!. We will restrict our analysis to normal Mach 
numbers yielding strictly positive pressure solutions only (Ma, > —5 for y = 5). 

It is easy to see that using P* from (45), the entropy inequality (44) is still 
satisfied for |У, | = 0, and the added entropy scales with the discrete value of У, 
at the boundary. Hence, for h — 0, the discrete boundary condition converges to 
its physical counterpart, since V, — 0. The choice of P* from (45) appears to 
stabilize under-resolved simulations, which can be now explained by the fact that 
the boundary flux always adds entropy for |У, | 4 0. 


3.2.2 Using Approximate Riemann Solvers for the Boundary Flux 


A well known strategy in finite volume methods is to mirror only the velocity of the 
internal state and solve an approximate Riemann problem to get the boundary flux, 
mostly just because of a simpler implementation, since an approximate Riemann 
solver is already available and used for the fluxes between the elements. For DG 
methods, see also, for example, [2] and [12] where reflection conditions are proved 
to be entropy stable. 

The mirror state is set so that the mass and energy flux are zero. Let the inner 
state be labeled L and the outer R. then the inner and outer states that satisfy the 
mirror condition are 


0. [0 оў, E| . UF [о оў -2w EJ (46) 


We show below under what conditions on the normal velocity V, that the reflection 
condition is entropy stable for the Lax-Friedrichs, HLL and HLLC, Roe and EC-ES 
fluxes. 


Lax-Friedrichs Flux 


We start with the simplest approximate Riemann solver, the Lax-Friedrichs or 
Rusanov flux, which reads as 


(F. я) = E | (FU + F^) z lmas ук — U^). (47) 


Inserting the states from (46), we get 


0 0 0 
(Fi) = С - Hx EJ - LE 
(48) 
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The maximum wave speed is normally approximated from the largest leftgoing and 
rightgoing wave speed, 


Атах = max(|V^| + c, |V] + с®) = |УИ-+с, since c^ cf = с, 
V, = VŽ = -VP (49) 


n n 


and thus gives a definition of P* 


P* 
(=) = 1 + yMa, (Ма, + |Man| + 1) 
Р Ле 


_ | 1+ УМа,(2Ма +1) > 1 for V,>0 (50) 
© |1+уМа„ <1 fo V, <0’ 
which shows that the Lax-Friedrichs flux satisfies the entropy inequality (44). 
HLL and HLLC Flux 
The HLL flux [13] is written as 
= * 1 = > 
Е. 1) DE P я. (s*k Ly. сір ^) d RO 2) 
(Fei) sx SPEU) — S'F(U^)) + 5758 (ОЁ -U 
(51) 
The leftgoing and rightgoing wave speeds are S^ = Ve ch = ys сЁ = -SF 
and the HLL flux reduces to 
(F i). 15 (FU) + FU) а (v^ - v^) (52) 
-n = –п: — — — ; 
HL 2 2 
If we would choose 5 to be the maximum wave speed, the HLL flux would reduce 
to the Lax-Friedrichs flux. However, with SË = yr +c? = - У, + c, an even 
simpler relation for P* is found, which also satisfies the entropy inequality 
Р* >1 for V,-0 
— =1 M $ 53 
(ha T Ti for У <0 ӨЗ) 


For ће НГТС flux [13], one can show that since the Riemann problem is symmetric, 
the approximate wave speed of the contact discontinuity is A* = 0 and, choosing 
$8 = —V, + c, HLLC reduces to the HLL flux. 


Р* p* 
(=) =1+yMa, = (=) (54) 
Р /HLLC Р /ни, 
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Roe Flux 


For the original Roe method without entropy fix [13], the mean values are 


, Мотур + pF А : 


V, = = 0, Vi, = Vi, б У», m У», , 
Ур" + y p* 
А ZI 
ве ie Yme. (55) 


After some manipulations, 


1 
" V, a 
(Е i) = (Fi) xk = (Е. я) iege V 
Roe с 
у, 
1(0Е + Р) 
0 
= | (0V? +pVne+P)n |. (56) 
0 


with Ai = V, — č = —6,a, = руһ /с and K! from [13]. This leads again to a 
definition of P* 


P (у=), 
— = 1+yMa, | Ма, +,/1+ Ма, |, (57) 
Р Roe 2 


which fulfills the entropy inequality as long as 


2 7 
Ma, > —|——, гу = = Ma, > – 1.12. (58) 
3—y 5 


Thus, the Roe flux is entropy stable for shocks, but not for supersonic rarefactions. 


EC-ES Fluxes 


We can also apply an entropy conservative (EC) flux that is used for interior element 
interfaces and add an entropy stable dissipation term (ES) to compute the boundary 
flux via the mirrored states (46). This is exactly the strategy proposed in Parsani 
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et al. [12] to get the boundary flux. Such an EC-ES flux is presented in Winters et al. 
[15] 


(F. i) = Frc (Uz, Ur) -ñ — SDH (Ww - we) (59) 


where D is a dissipation matrix and the matrix Н [м] = [0] is carefully derived 
from the left and right states. Details are given in [15], where two approaches for 
the dissipation are distinguished. One is a Lax-Friedrichs-type dissipation, scaling 
with the maximum eigenvalue Атах = |Vn| +c (referred to as EC-LF'). The other is 
a Roe-type dissipation computed via the eigenstructure of the matrix (D H) (referred 
to as ‘EC-Roe’). 

If we carefully insert the two mirrored boundary states into (59), we again get an 
equation for the modified pressure 


P* 
(=) = 1+yMay (|Ма,| + 1) (60) 
EC-LF 


for the Lax-Friedrichs-type dissipation and 


р* 
(5) = 1+ уМа, (61) 
Р ЕС-Кое 


for the Roe-type dissipation. Both approaches lead to an entropy stable boundary 
flux when using a mirrored state. Note that the modified pressure of the EC-Roe 
flux (61) exactly matches the one of the HLL flux (53). 


4 Discussion 


In the previous section we have shown conditions under which a specified wall 
flux is stable. In the linear analysis, the central numerical flux adds no dissipation 
and is neutrally stable. In the nonlinear analysis, entropy is not generated if the 
numerical wall pressure is equal to the internal pressure, P* = P’. For upwinded 
approximations, the amount of energy or entropy dissipation depends on the normal 
Mach number. Since the boundary condition is only imposed weakly through the 
numerical flux, the normal Mach number will not be exactly zero except in the 
convergence limit. In fact, flow computations (especially steady state ones) are 
usually initiated with an impulsive start, where the initial state is a uniform flow, 
and the normal Mach number is not zero. This has proved over time to be very 
robust in practice. The analysis above gives an explanation why. 

In the linear analysis the dissipation due to imposing the boundary condition 
is proportional to the square of the normal Mach number. With an impulsive 
start initialization, this dissipation will be large. As the flow develops and the 
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10? 


—0.8 -0.6 -0.4 


Fig. 1 Entropy contribution As (62) produced by the wall boundary flux. RP refers to the exact 
Riemann problem (45), LF to (50), EC-LF to (60), HLL to (53) and Roe to (57). Plotted over the 
normal Mach number ranges |Ma, | < 5 on the top and restricted to |Ма, | < 1 on the bottom 


boundary condition is better enforced, the dissipation reduces, going away only as 
the approximate solution converges. 

A similar effect is observed for the use of the different approximate Riemann 
solvers in the nonlinear analysis. In Fig. 1, we compare the entropy contribution 


As = (pc)May (5 = i) (62) 


for the different wall boundary fluxes, over a range of normal Mach numbers for 
(ос) = 1 апа y = 7/5. When the boundary condition is exactly fulfilled (Ma, = 0), 
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the entropy contribution is zero. For low normal Mach numbers, all fluxes have the 
same behavior. Compared to the exact Riemann problem (RP), the Lax-Friedrichs 
flux and the EC-LF flux always produce more entropy whereas the HLL flux 
produces less entropy for impinging velocities Ma, > 0. The results of HLLC 
and EC-Roe fluxes are not plotted, as they coincide with the HLL flux. As shown 
in the analysis, the Roe flux produces a negative entropy change for supersonic 
rarefactions, implying that it is not suitable for all flow configurations. 
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On the Order Reduction of Entropy (8) 
Stable DGSEM for the Compressible ER 
Euler Equations 


Florian J. Hindenlang and Gregor J. Gassner 


1 Introduction 


Discontinuous Galerkin spectral element collocation method (DGSEM) with either 
Legendre-Gauss or Legendre-Gauss-Lobatto (LGL) nodes (see e.g. [14]) are among 
the most efficient variants in the class of element based high order methods, such as 
e.g. discontinuous Galerkin, flux reconstruction, or summation-by-parts (SBP) finite 
differences. In particular, the LGL variant, starting in [9], turned out to be similar 
to a SBP finite difference approximation with simultaneous-approximate-term tech- 
nique (SAT). This relationship allowed to construct conservative skew-symmetric 
approximations, e.g. [9, 10, 21], and later enabled DGSEM-LGL approximations 
that are discretely entropy stable, e.g. [1, 3, 6, 8, 13, 17, 19, 20], and/or kinetic 
energy preserving [12]. These novel variants of nodal split form DG methods feature 
drastically increased non-linear robustness towards aliasing induced instabilities 
and favourable properties regarding the simulation of unresolved turbulence, e.g. 
[7, 23]: 

In addition to the very robust dissipative entropy stable versions, it is also 
possible to construct virtually dissipation free variants by choosing appropriate 
element interface numerical fluxes. These entropy conserving variants all show 
an odd-even behavior when experimentally testing the order of convergence, 
e.g. [9, 21], where the observed convergence order for even polynomial degrees 


Е J. Hindenlang 
Max Planck Institute for Plasma Physics, Garching, Germany 
e-mail: florian.hindenlang @ipp.mpg.de 


G. J. Gassner (24) 

Department for Mathematics and Computer Science, Center for Data and Simulation Science, 
University of Cologne, Cologne, Germany 

e-mail: ggassner@math.uni-koeln.de 


© The Author(s) 2020 21 
S. J. Sherwin et al. (eds.), Spectral and High Order Methods for Partial Differential 

Equations ICOSAHOM 2018, Lecture Notes in Computational Science 

and Engineering 134, https://doi.org/10.1007/978-3-030-39647-3_2 


22 Е J. Hindenlang and С. J. Gassner 


N is N and for odd М is М + 1. Lately, a discussion emerged in the com- 
munity, with interesting debates during the recent ICOSAHOM conference in 
London, where researchers reported non-optimal convergence behavior of the 
entropy stable DGSEM-LGL even with dissipative numerical surface fluxes, e.g. 
[6]. 

This paper contributes to this discussion and presents results of an experimental 
convergence order study for the compressible Euler equations with (1) the standard 
DGSEM with either Gauss and LGL nodes, (2) the entropy stable DGSEM with 
LGL nodes. For these nodal schemes, we test the convergence order with different 
numerical surface fluxes and report the results depending on the Mach number of the 
test case. The remainder of the paper is organized as follows: in the next section we 
describe the numerical model for our numerical experiments, in Sect. 3 we present 
our observed experimental convergence orders for different configurations and draw 
our conclusion in Sect. 4. 


2 Numerical Model 


We consider the compressible Euler equations defined in the domain Q C IR? 
of; 
ш + У =0. (1) 


The state vector contains the conservative variables and the advective flux сотро- 
nents are 


“| [m | [æ | [m 
| Q | О?у] | | Qvi +p | QUU QU3 V] 
ои = | от | = ov |. {з= | QV] V2 [е | оу + р usc | QU va | 
| Е | QU3 | | QV] V3 | | 002 V3 | | ov 2+р 
Е | | Œ+ pui | | Œ+ pv; | + pss | 
(2) 


Here, о, v = (v1, v2, v3)? , р, E are the mass density, fluid velocities, pressure and 
total energy. We close the system with the ideal gas assumption, which relates the 
total energy and pressure 


ШЕ 
к= - D(E- 5e Ia). (3) 


where y denotes the adiabatic coefficient. 
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For our discretization, we subdivide the domain into non-overlapping hexahedral 
elements. For each element, we define a transfinite mapping to a unit reference space 
and use this mapping to transform the Eq. (1) from physical to reference space. A 
weak form is created by taking the inner product of the transformed equation with 
a test function. We use integration-by-parts for the flux term and approximate the 
resulting weak form as follows: the conservative variables are approximated by a 
polynomial in reference space with degree N, interpolated at the Gauss or LGL 
nodes. The volume fluxes are replaced by a standard interpolation of the non-linear 
flux function at the same Gauss/LGL nodes (standard DGSEM-Gauss or DGSEM- 
LGL), see e.g. [14]. For the LGL variant, we are also able to introduce the split 
form volume integral based on entropy conserving and kinetic energy preserving 
numerical volume fluxes (Split-DGSEM), e.g. [12] and [22], resulting in either the 
entropy conserving or entropy stable DGSEM variants, depending on the choice of 
numerical surface flux. 


3 Convergence Results 


In this section, we compare the convergence of the standard DGSEM and the 
entropy conservative and entropy stable discretization for different choices of the 
numerical flux and polynomial degrees N = 2, 3, 4, 5. 

We choose the test case of a two-dimensional density wave, with a constant 
pressure and transported with a constant velocity, which was proposed for one- 
dimensional convergence tests in [4]. The density evolves as 


Q(x1, x2, t) = 1 + 0.1 sin (x (Ga = vit) + (х2 — v2t))) (4) 


with a prescribed velocity (vi, v2). The pressure is chosen as р = 1/у with 
y = 1.4, so that the sound speed ranges between с = 0.95...1.05. Thus, by 
changing the velocity, we change the Mach number of the flow Ma = |v|/c. Three 
Mach numbers are chosen: Ma = 0.2 with (vj, v2) = (0.1, 0.15), Ma © 1.0 with 
(01, v2) = (0.7, 0.65) and Ма ~ 3.5 with (v1, v?) = (2.5, 2.4). The experimental 
order of convergence (EOC) is computed with the L2 error of the density at 
і = 1. 

The convergence study is performed with the open source, three-dimensional 
curvilinear split-form DG framework FLUXO (www.github.com/project-fluxo). As 
the test case is two-dimensional, we use fully periodic cartesian meshes of the 
domain [—1, 1]? with an equal number of elements in x- and y-directions and always 
one element in z-direction. Note that йо in the convergence tables refers to the 
coarsest mesh level, which is 4? elements for N = 2,3 (ho = 1 /2) and 2? elements 
for М = 4, 5 (ho = 1). 
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АП simulation results are obtained with an explicit five stage, fourth order 
accurate low storage Runge-Kutta scheme [2], where a stable time step is computed 
according to the adjustable coefficient CFL Е (0, 1] the local maximum wave 
speed, and the relative grid size, e.g. [11]. We made sure that the time integrator 
did not influence the spatial convergence order, by adjusting the CFL number 
accordingly. 


3.1 Standard DGSEM 


The convergence of the standard DGSEM with Gauss-Legendre nodes (DGSEM- 
Gauss) and with Legendre-Gauss-Lobatto (DGSEM-LGL) is shown in Tables 1 
and 2, for the three Mach numbers and two choices of the numerical flux, 
namely the HLL (Harten, Lax, van Leer) flux and the Roe flux. The results 
of the LLF (local Lax-Friedrichs) flux and the HLLC flux (HLL variant with 
three waves, C for 'contact wave) are reported in the Appendix, as the HLL 
results are similar to ШЕ, and HLLC behaves exactly the same as Roe, see 
Tables 4 and 5. Details on the properties and the implementation of the LLF, HLL, 
HLLC, and Roe fluxes are found in the book of Toro [18] and the references 
therein. 

For the HLL flux and the low Mach number Ma — 0.2, we observe an odd-even 
behavior with an order reduction for even polynomial degrees N = 2, 4. Also for 
Ma = 1.0, the convergence for even degrees is slightly affected, whereas for the 
high Mach number, all fluxes converge with full order. Comparing the L» errors of 
the finest mesh for HLL and Roe for the low Mach number, HLL is less accurate for 
N — 2,4 and more accurate for N — 3, 5. 

АП numerical fluxes are approximate Riemann solvers, but the LLF and HLL 
only use the maximum wave speeds, whereas the HLLC and Roe also take the 
contact wave into account, and therefore keep the full order of the scheme for all 
Mach numbers for this test case. 


3.2 Entropy Conservative and Entropy Stable DGSEM 


Now, we investigate the order reduction of the entropy conservative and entropy 
stable discretizations. Here, the standard DGSEM volume integral is replaced by 
split-form formulation (Split-DGSEM) using a two-point entropy conservative and 
kinetic energy preserving flux (ECKEP). If we choose the ECKEP flux at the 
surface, we get an entropy-conserving scheme. For entropy stability, we can use the 
LLF or HLL flux directly at the surface, or use the ECKEP flux and add a dissipation 
term, which must still satisfy the entropy inequality condition. In Winters et al. [22], 
such dissipation terms are carefully derived, using either only the maximum wave 
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speed (LLF-type) or incorporating all waves (Roe-type), which we will refer to as 
ECKEP-LLF and ECKEP-Roe fluxes. 

In Table 3, we summarize the convergence of the dissipation-free ECKEP flux, 
the HLL and ECKEP-Roe flux. The results for LLF and ECKEP-LLF fluxes are 
found in the Appendix in Table 6, as they have the same convergence and error levels 
as the HLL flux. As expected, the dissipation-free surface flux (ECKEP) produces 
an order reduction for all Mach numbers for N = 3,5, and for N = 2 full order is 
not kept in the last refinement step. 

If we simply use the HLL flux, we have an entropy stable scheme, but an order 
reduction for N — 2,4 can be observed for the low Mach number flow, analogously 
to the standard DGSEM-LGL scheme. Interestingly, the odd-even behavior switches 
between entropy conserving and entropy stable fluxes. 

The ECKEP-Roe entropy stable flux accounts for all waves of the Riemann 
problem and adjusts the dissipation for each wave accordingly, which gives full 
order convergence for all Mach numbers. 


4 Conclusions 


In this work, we report the convergence of standard DGSEM Gauss and Gauss- 
Lobatto schemes to entropy conservative (EC) and entropy stable (ES) DGSEM 
schemes for the Euler equations, as there have been findings of order reduction for 
EC and ES schemes. We choose a simple density transport test case on a periodic 
domain and investigate the influence of the Mach number of the transport velocity. 

The EC scheme is dissipation free and an order reduction is observed by the 
convergence study presented here, confirming many similar observations found in 
literature. We also confirm that the ES scheme can have an order reduction for 
low Mach numbers, but only if the entropy stable numerical flux relies on simple 
approximate Riemann solvers such as local Lax-Friedrichs or HLL. If all waves are 
accounted for in the dissipation term of the entropy stable flux as presented in [22], 
the full order is observed for all Mach numbers. In addition, we reproduce the same 
behavior for the standard DGSEM Gauss and Gauss-Lobatto schemes, where the 
LLF and HLL fluxes suffer from order reduction at low Mach number, and HLLC 
and Roe fluxes have full order for all Mach numbers. 

We want to emphasize that the present convergence study should be seen merely 
as an observation, confirming that the numerical flux can have strong influence 
on the convergence order for both the standard DGSEM and the entropy stable 
DGSEM. Also, we stress that in our tests the order reduction is related to the form of 
the dissipation term in the numerical surface flux and is not related to the insufficient 
integration precision of the LGL-quadrature. 

Based on the observations presented in this work, a possible explanation for the 
loss of convergence for the density transport at low Mach numbers when using LLF 
and HLL fluxes is the form of dissipation from the approximate Riemann solver. 
In the case of the density transport, the exact solution follows the characteristic 
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with velocity v. However, the approximate Riemann solver LLF and HLL consider 
only two waves with maximum velocity ^ (|v| + c) and do not consider the 
contact wave with velocity v. Thus, the contact wave is dissipated proportional to 
^ (|V| + c) and not to |v]. For low Mach numbers, where с > |v], this causes 
over-upwinding. Over-upwinding was discussed in [5, 15]. It is not intuitive at 
first, but over-upwinding (over-penalization) can lead to a reduction of the in-built 
dissipation of the DG scheme, getting wave-propagation characteristics similar to 
a continuous Galerkin method [16]. This loss of in-built dissipation could be an 
explanation for the even-odd behavior we observed. However, it is still unclear why 
numerical surface fluxes with no in-built dissipation that are symmetric, e.g. EC flux, 
lead to an odd-even behavior in the convergence order and why numerical surface 
fluxes with over-upwinding, i.e. reduced dissipation due to over-penalization, cause 
an opposite even-odd behavior. What supports the explanation is the recovery of full 
convergence order for LLF and HLL when the difference in wave speed becomes 
smaller for higher Mach numbers, i.e. no over-upwinding. In contrast to LLF and 
HLL, the HLLC and Roe solvers take specifically the contact wave into account 
and adjust the dissipation accordingly and thus avoid strong over-upwinding by 
construction. In our tests, we always observe full convergence order for all Mach 
numbers for HLLC and Roe. 

Lastly we note that a convergence study using a manufactured solution technique 
can be misleading, as full convergence order is found independent of the choice of 
numerical flux. Hence, the introduction of a source term to balance the prescribed 
solution overcomes possible deficiencies of the surface fluxes, showing the limit 
of the manufactured solution technique in this context. In the Appendix, the 
convergence results of a manufactured solution are reported. 
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Appendix 
Additional Convergence Results 


In this section, we present additional convergence results of the density wave test 
case for the DGSEM-Gauss and DGSEM-LGL with LLF and HLLC fluxes in 
Table 4 and Table 5, and also the entropy stable schemes with LLF and ECKEP- 
LLF fluxes in Table 6. The results for LLF-type fluxes behave like the HLL flux, 
and for the HLLC flux like the Roe-type fluxes presented in Table 3. 
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Manufactured Solution with Source Term 


Here, we run a convergence test with the method of manufactured solutions. To do 
so, we assume a two-dimensional solution of the form 


T T 
u —[o. vı , Qv; ‚оз, E] = [8,8,8,0, 8] T 


with g = g(x1, x2, t) = 0.5 sin(2x (x1 + x2 — t)) + 2. 


Note that the average Mach number in the domain is Ma = 0.8. Inserting (5) into 
the Euler equations, and using the fact that spatial and time derivatives аге ©’ = 


Ox, 8 = д.8 = —0,g, we get an additional residual 
g' 
3 at, Gy — 2)g' +2(y — Dgg' 
— =] By – 2)6' +2(у- Degg’ 6 
as Gy — 2)g' +2(у — 188 (6) 
=l 0 


(бу — 2)g' + 2(2y — 1)gg' 


To solve the inhomogeneous problem, we subtract the residual from the approximate 
solution in each Runge-Kutta step. Moreover, we run the test case up to the final 
time t= 1.0. 

In the convergence results for the standard DGSEM Gauss and Gauss-Lobatto, 
we see that the LLF flux still leads to an order reduction for N = 2,4, 
whereas full order is found for the HLL, HLLC and Roe fluxes, see Tables 7 
and 8. 

In Table 9 the entropy conservative scheme shows again an order reduction for 
N = 3, 5, and the LLF-Type dissipation too, for М = 2, 4, and for this test case, all 
entropy stable schemes exhibit full order. 
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A Review of Regular Decompositions (8) 
of Vector Fields: Continuous, Discrete, gett 
and Structure-Preserving 


Ralf Hiptmair and Clemens Pechstein 


1 Introduction 


For a bounded Lipschitz domain 2 C R? recall the classical L?-orthogonal 
Helmholtz decompositions 


L^(Q) = V HÈ (2) 6 H(div0, 2) = V H! (2) Ф Ho(divO, 2), 


see, e.g., [9, Ch. XI, Sect. I]. They can be used to derive decompositions of 
(subspaces of) H (curl, 2): 


Ho(curl, 2) = V HE(2) 6 Xv(2), Xy(Q) := Ho(curl, 2) N H(div0, 2), 
H(curl, 2) = V H! (2) Xr (2), Xr(2) := H(curl, 2) N Ho(div0, 2). 


If the domain 42 is convex then the respective complementary space, Xy (2) 
ог Xr(£2), is continuously embedded in the space H! (2) of vector fields with 
Cartesian components in H 1(2), cf. [1]. Then one can, for instance, write any 
u € H(curl, 2) as 


des p (1) 
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with p € H' (2) and z € H'(Q). Since | V ро) < llullyz(g one obtains 
(using the continuous embedding) the stability property! 


Il V pliz + lizac < Clizineun.o) - (2) 


А similar decomposition can be found for u Е Ho(curl, 42). 

Generally, a decomposition of form (1) with the stability property (2) is called 
regular decomposition, even if L?-orthogonality does not hold. Actually, it turns out 
that (1)-(2) can be achieved even in cases where £2 is non-convex, in particular on 
non-smooth domains, or in cases where £2 or its boundary have non-trivial topology; 
only the L?-orthogonality has to be sacrificed, cf. [20]. 

Noting that V H! (€2) is contained in the kernel of the curl operator and that— 
under mild smoothness assumptions on the domain—the whole kernel is spanned by 
V H!(Q) plus a finite-dimensional co-homology space [15, Sect. 4] one can achieve 
a second decomposition, 


u=h+z, (3) 
with h є Кег(си н(син,2)) and z € H! (2), where 


Ihly2(o) < С Чуо): 1211н. оу < С || curl ullyo(o, - (4) 


The second stability estimate states that if u is already in the kernel of the curl oper- 
ator, then z is zero. Hence, (1) the operator mapping u to h is a projection onto the 
kernel space and (2) the complement operator projects и to the function z of higher 
regularity H! (2). For trivial topology of 2 and 82, the two decompositions (1)- 
(2) and (3)-(4) coincide. 

As a few among many more [17, Sect. 1.5], we would like to highlight two 
important applications of these regular decompositions. 


1. The second form (3)-(4), in the sequel called rotation-bounded decomposition, 
can be used to show that the operator underlying a certain boundary value 
problem for Maxwell's equations is a Fredholm operator. The key point is 
that the complement space of the kernel (from the view of the mentioned 
projections) is Н! (2) which is compactly embedded in L? (42), see e.g., [14, 16] 
and references therein. 

2. The first form (1)-(2), in the sequel called gradient-based decomposition, has 
been used to generate stable three-term splittings of a finite element subspace 
of H(curl, 2), cf. [19-21, 23], which allows the construction of so-called 
fictitious or auxiliary space preconditioners for the ill-conditioned system matrix 
underlying the discretized Maxwell equations. 


! Here and below C stands for a positive "generic constant" that may depend only on 2, unless 
specified otherwise. 
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In both applications, it is desirable to obtain the decompositions for minimal 
smoothness of the domain, e.g., Lipschitz domains, which are not necessarily 
convex. Moreover, it is also desirable to go beyond decompositions of the entire 
space H(curl, 42) and extend them to subspaces for which the appropriate trace 
vanishes on a “Dirichlet part" Гр of the boundary. In this case traces of the two 
summands should also vanish on Гр. 

In the present paper, we provide regular decompositions of both types for sub- 
spaces of H(curl, 2) (in Sect. 3) and H(div, 52) (in Sect. 4) comprising functions 
with vanishing trace on a part Гр of the boundary 952 for Lipschitz domains 42 
of arbitrary topology. In particular, 42 is allowed to have handles, and 942 and Гр 
may have several connected components. The Dirichlet boundary Гр must satisfy a 
certain smoothness assumption that we shall introduce in Sect. 2. In addition to the 
stability estimates (2) and (4), we show that the decompositions are stable even in 
12(0). 

In the final part of ће manuscript, in Sect. 5, we establish regular decompositions 
of spaces of Whitney forms, which are lowest-order conforming finite element 
subspaces of H(curl, 52) and H(div, 42), respectively, built upon simplicial trian- 
gulations of 42. 

This note is based on [17] and is an abridged version of [18]. Please refer to this 
latter preprint for complete proofs of the results quoted below. 


2 Preliminaries 


Since subtle geometric arguments will play a major role for parts of the theory, we 
start with a precise characterization of the geometric setting: Let (2 C IR? be an 
open, bounded, connected Lipschitz domain.” We write d(Q) for its diameter. Its 
boundary Г :— 942 is partitioned according to Г = Гр U X U Гу, with relatively 
open sets Гр and Гу. We assume that this provides a piecewise C! dissection of 
952 in the sense of [12, Definition 2.2]. Sloppily speaking, this means that X is the 
union of closed curves that are piecewise C!. 

Under the above assumptions оп £2 and Гр, [12, Lemma 4.4] guarantees the 
existence of an open Lipschitz neighborhood 452 г (“Lipschitz collar") of Г and of a 
“bulge” Yp C 2r \ X2. We recall the properties of bulge domains from [12, Sect. 2, 
Thm. 2.3], also stated in [17, Thm. 2.2]: 


Theorem 1 (Bulge-Augmented Domain) There exists a Lipschitz domain Tp C 
R3\Q, such that Y pN@ = Ip, 2° :— ТЪЧГЬЧ is Lipschitz, d($2^) < 21(0), 
and Y p C г. Moreover, each connected component Tp p of Гр corresponds to a 
connected component Үр к of Y p, and these have positive distance from each other. 


?Strongly Lipschitz, in the sense that the boundary is locally the graph of a Lipschitz continuous 
function. 
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Let 


НГ, (2) := {u € H! (2): (уи)гь = 0}, 
Hr, (curl, 2) := {u є H(curl, 2): (у;ш)г = 0}, 
Нг, (div, 2) := {u є H(div, 2): (ynu) rp = 0}, 
denote the standard Sobolev spaces where the distributional gradient, curl, or 
divergence is in L? and where the pointwise trace yu, the tangential trace yru, 


or the normal trace y,u, respectively, vanishes on the Dirichlet boundary Гр, see 
e.g. [3, 6, 26]. These space are linked via the de Rham complex, 


Krp(@) > НА (0) > Hr, (curl, 2) 5 Hr, (div, 2) № L2), 
(5) 


where 


span{l}, if Гр = Ø, 


Krp(2) := {о € НГ, (42): v = const] = 
{0}, otherwise. 


The range of each operator in (5) lies in the kernel space of the succeeding one, cf. 
[3, Lemma 2.2]. We define 


Нг, (curl 0, 2) := (v € Нг, (curl, 2): curl v = 0}, 
Hr, (div 0, 2) := (v є Hr, (div, 42): div v = 0}. 


(6) 


Barring topological obstructions these kernels can be represented through poten- 
tials: Let 81 (42) denote the first Betti number of 52 (the number of handles") and 
P2(*2) the second Betti number (the number of connected components of 952 minus 
one). By the very definition of the Betti numbers as dimensions of co-homology 
spaces we have 

В1(2) =0 = Heurl0, 2) = V H! (Q2), (7) 


В2(52) 20 =  H(divO, 2) = curl H(curl, 2), (8) 


cf. [26]. We call 2 topologically trivial if 8 (2) = В2(52) = 0. 


3 Regular Decompositions and Potentials Related to H(curl) 


Throughout we rely on the properties of 2 and Гр as introduced in Sect. 2 and 
use the notations from Theorem 1. We write C for positive "generic constants" and 
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say that a constant "depends only on the shape of 22 and Гр”, if it depends on the 
geometric setting alone, but is invariant with respect to similarity transformations. 
To achieve this the diameter of 52 will have to enter the estimates; we denote it by 
d(2). 


3.1 Gradient-Based Regular Decomposition of H (curl) 


The following theorem is essentially [17, Thm. 2.1]. 


Theorem 2 (Gradient-Based Regular Decomposition of H(curl)) Let (2, Гр) 
satisfy the assumptions of Sect. 2. Then for each u Е Нг, (curl, 52) there exist Z € 
H}, (2)andp € НГ, (2) depending linearly оп u such that 


(i) u=Zz+ Vp, 
(її) 112110,2 + Il V ро, о < С|ч||о,о, 
1 1 
(iti) | V 210,2 + cy ee < C|| curl ullo,2 + ac; "oe З 


with constants depending only on the shape of 2 and Гр, but not оп d(42). 


Remark 1 An early decomposition of a subspace of H(curl, 42) Г H(div, 12) into 
a regular part in H! (42) and a singular part in УН! (42) can be found in [4] and 
in [5, Proposition 5.1], see also [7, Sect. 3] and references therein. Theorem 2 
was proved in [14, Lemma 2.4] for the case of Гр = 05 and without the L?- 
stability estimate, following [5, Proposition 5.1]. Pasciak and Zhao [28, Lemma 2.2] 
provided a version for simply connected 52 and the case Гр = д‹2 with pure 
L?-stability, but p is only constant on each connected component of 32 (see also 
Theorem 5 and Remark 3). This result was refined in [24, Thm. 3.1]. For the case 
Гр = Ø, [14, Lemma 2.4] gives a similar decomposition but V p must be replaced 
by an element from H(curl0, 52) in general. Finally, Theorem 2 without the pure 
L?-stability was proved in [20, Thm. 5.2].? 


Remark 2 The constant C in Theorem 2 depends mainly on the stability constants 
of key extension operators. If the bulge Yp has multiple components Үр к, the final 
estimate will depend on the relative distances between Ур к, Үр, е, К = £ and the 
ratios d(Yp к)/ 9(42). 


3This reference contains a typo which is easily identified when inspecting the proof: In general, z 
cannot be estimated in terms of || curl u||o, х but one must use the full H(curl) norm. 
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Remark 3 If Гр = 82, one obtains only p € H!(42) being constant on each 
connected component of Гр but the improved bound 


ll V zllo, + d(2) 2110,2 < C || eurlullo,o . 


Results on regular decompositions in this special case can be found in [24, 28]. 


3.2 Regular Potentials for Some Divergence-Free Functions 


Let the domain 22 and ће Dirichlet boundary part Гр be as introduced in Sect. 2 and 
let [j,i =0,..., 82(€2), denote the connected components of 092, where 82 (42) is 
the second Betti number of 42. 

We define the space* 


Hy; (div 00, 2) := fa € Hr, (div0, 2): (yng, D; =0, i =0,..., p2(2)| | 
(9) 


Above y, denotes the normal trace operator, and the duality pairing is that between 
H V"? (T1) and H!? (17). If Гр = Ø we simply drop the subscript Гр. Obviously, 


Hr, (div 00, 2) C H(div00, 2). 


The next result identifies the above space as the range of the curl operator. 
Theorem 3 (Regular Potential of Range(curl)) Let (2, Гр) be as in Sect. 2 and 
assume in addition that each connected component Тр к of the bulge has vanishing 
first Betti number, B\(Yp,x~) = 0. Then 

Hr, (div 00, 2) = curl Hr, (curl, 2) = curl Hj, (2) ; 


and for each q € Hr, (div 00, 2) there exists Yy Е Hj, (2) depending linearly on 
q such that 


1 
curl y —q and |V lo 2 + 100) ПУ 0,2 < C 1911, . 


where С depends only on the shape of 42 and Гр, but not on d(&2) 


^ Alternatively we can define Hr, (div 00, 42) as the functions in Hr, (div 0, 52) orthogonal to the 
harmonic Dirichlet fields H(div 0, 2) N Ho(curl 0, 2). 
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Remark 4 For the case that Гр = Ø, we reproduce the classical result 
H(div 00, 2) = curl H(curl, 2) = curl H!(Q), 


see [11, Thm. 3.4]. In that case, Step 4 of the proof can be left out and у = wi 
which is why div y» = 0 in 2. This property, however, is lost in the general case. 


3.3 Rotation-Bounded Regular Decomposition of H (curl) 


We can now formulate another new variety of regular decompositions, for which the 
H!-component will vanish for curl-free fields. 


Theorem 4 (Rotation-Bounded Regular Decomposition of H(curl) (D) Let 
(42, Гр) be as in Sect. 2 and assume, in addition, that each connected component 
Трк of the bulge has vanishing first Betti number, B\(Yp,~) = 0. Then, for 
each м € Hr; (curl, 42) there exist z € Hj, (2) and a curl-free vector field 
h € Hr; (curl 0, 2), depending linearly оп u such that 


u=z-+h, 


Про, о < lullo,2 + С 9(42) || curl цо, о, 


1 
I V 210,2 + 100) aloe < C || curl цо, о, 


where C depends only on the shape of 2 and Гр, but not on d(X2). 


Remark 5 'The constant C in Theorem 4 depends essentially on the stability 


constants of the divergence-free extension operator goi" and the (adapted) Stein 


| V.Stei 
extension operator Ey. 


Another stronger version of the rotation-bounded regular decomposition of 
H(curl) gets rid of the assumptions on the topology of the Dirichlet boundary and 
has improved stability properties (though with less explicit constants). 


Theorem 5 (Rotation-Bounded Regular Decomposition of H(curl) (ID) Let 
(2, Гр) be as in Sect. 2. Then for each м € Hr, (curl, 52) there exist z € Hf, (2) 
and a curl-free В € Hr, (curl0, 2) depending linearly on u such that 


u=z+h, 
1210,2 + 100,2 < C |lullo.e . 


ll V zllo.o + d(2) 2110,2 < C | curl цо, о, 


where C depends only on the shape of 2 and Гр, but not on d(X2). 
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Remark 6 For the case Гр = 942 the result of the theorem is already proved by 
Remark 3 since we obtain u = z 4- V p with Vp € V Hà сы) = Ho(curl, 2). 


Remark 7 We would like to emphasize that both in Theorems 2 and 5, the domain 
Q may be non-convex, non-smooth, and may have non-trivial topology: It may 
have handles and its boundary may have multiple components. Also the Dirichlet 
boundary Гр may have multiple components, each of which with non-trivial 
topology. Moreover, we have the pure L^(42)-stability in both theorems. In this 
sense, the results of Theorems 2 and 5 are superior to those found, e.g., in [7, 
Thm 3.4], [19] or the more recent ones in [8, Thm. 2.3], [22]. 


Remark 8 If 92 has vanishing first Betti number, В1 (52) = 0, then Hr; (curl 0, 2) 
= УНГ, const). Hence, we can split each и є Hr; (curl, 42) into z € Hj, (2) 
and Vp with p € H!(€2) being constant on each connected component of Гр. If 


Гр is connected, then p € H Tp (2). Summarizing, if 52 has no handles and if Гр 
is connected, then we have the combined features of Theorems 2 and 5. 


Finally, we mention that the regular decomposition theorems spawn projection 
operators that play a fundamental role in the analysis of weak formulations of 
Maxwell’s equations in frequency domain [14, Sect. 5]. 


Corollary 1 Let (42, Гр) be as in Sect.2. Then there exist continuous pro- 
jection operators R: Hr, (curl, 2) — H} (2) and М: Hr,(eurl, 2) — 
Hr, (curl 0, 2) such that R + М = id and 


Rv lca) + МУ) < C ||У[\їн(сип!,2) Vv € H(curl, 52), 


where C is a constant independent of v. Moreover, Е: Hr,(curl, 2) — 
Hr, (curl, 2) defined Бу Fv := Rv — Му is an isomorphism. 


Remark 9 The L?-estimates from Theorem 4 then show that the corresponding 
operator В can be extended to a continuous operator mapping from L?(42) to 
12(0). 


4 Regular Decompositions and Potentials Related to Н (div) 


The developments of this section are largely parallel to those of Sect. 3 with some 
new aspects concerning extensions and topological considerations. 


4.1 Rotation-Based Regular Decomposition of H (div) 


The following theorem is the H(div)-counterpart of Theorem 2. 
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Theorem 6 (Rotation-Based Regular Decomposition of H(div)) Let (2, Гр) 
satisfy the assumptions made in Sect. 2. Then for each v € Hr, (div, 2) there exist 


ZE НГ, (52) and q € Hl. 5 C2) depending linearly on v such that 


v = Z4 curlq, 


Z + || curl + x C |у | 
1210,2 + || 90.2 gg; alo < |У|0,2 
Vz 4 l РА + І У < С( curl vllo, + v ), 
У 210,2 агу! о, ag 910,2 < 0,2 ag [0,2 


with constant C depending only on the shape of 2 and Гр, but not on d(X2). 


4.2 Regular Potential with Prescribed Divergence 


The next result carries Theorem 3 over to H (div). 


Theorem 7 (Regular Potentials for the Image Space of div) Let (42, Гр) be as 
in Sect. 2 and, in addition, assume that each connected component Тр x of the bulge 
has a connected boundary, i.e., Bo (Тр к) = 0. Then 


L?(Q) = div Hr, (div, 2) = div H}, (2). 


Moreover, for each v Е L?(€2) there exists q € Hj. (2) depending linearly on v 
such that, with a constant C depending оп 2 and Гр but not on d(X2), 


divg = d V < С 
ivq—v and | айоо + теру! чо. оо, о. 


4.3 Divergence-Bounded Regular Decompositions of H (div) 


We can now formulate other variants of regular decompositions of H(div) in analogy 
to what we did in Sect. 3.3. 


Theorem 8 (Divergence-Bounded Regular Decomposition of H(div) (D) Let 
(0, Гр) be as in Sect. 2. In addition, assume that each connected component 
Тр к of the bulge has a connected boundary, i.e., Bo(Y pk) = 0. Then, for each 
v € Hy, (div, 2) there exists z € Hr, (2) and a divergence-free vector field 
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h € Hr; (div 0, 2) depending linearly on v such that 


v=z+h, (10) 
По о < |У|0.2 + C d(€2)|| div vllo. , (11) 
1 
I| V zllo,2 + ác; 12192 < C || div vllo.c , (12) 


where C depends only on the shape of 2 and Гр, but not оп d(X2). 


The last variant of H(div) regular decomposition of H(div) dispenses with the 
assumptions on the topology of the Dirichlet boundary and has better stability 
properties than the splitting from Theorem 8 (though with less explicit constants). 


Theorem 9 (Divergence-Bounded Regular Decomposition of H(div) (ID) Let 
(42, Гр) be as in Sect. 2. Then, for each v € Hr, (div, 2) there exists 2 Е Hr, (£2) 
and a divergence-free vector field h € Hr, (div 0, 42) depending linearly on v such 
that 


v=z+h, (13) 
IZ\lo,2 + ПЫо, о < Ilvllo,2 . (14) 
1 | 
| V 2100 + 120,2 < С||шбуу|о,о. (15) 


d(Q) 


where C depends only on the shape of 2 and Гр, but not оп d(X2). 


5 Discrete Counterparts of the Regular Decompositions 


The discrete setting to which we want to extend the concept of regular decomposi- 
tions is provided by finite element exterior calculus (FEEC, [2]) which introduces 
finite element subspaces of H(curl) and H(div) as special instances of spaces of 
discrete differential forms. In this section we confine ourselves to the lowest-order 
case of piecewise linear finite element functions. 

Throughout, we assume that (42, Гр) is as in Sect. 2, and, additionally, that 42 is 
a polyhedron and that д Гр consists of straight line segments. All considerations take 
for granted a shape-regular family of meshes (7^ );, of 2, consisting of tetrahedral 
elements, and resolving Гр in the sense that Гр is a union of faces of some of the 
tetrahedra. 

The following finite element spaces will be relevant: 


* the space we гр (42) of H T (2)-conforming piecewise linear Lagrangian finite 
element functions, 

* the space wi. Гр (2) of Hr, (curl, 2)-conforming lowest order Nédélec ele- 
ments, also known as edge elements, 


Regular Decompositions 55 


* the space wi Гр (2) of Hr, (div, ‹2)-сопїогтїпг lowest order tetrahedral 
Raviart-Thomas finite elements, aka, face elements, 

* the space У Tp (0) := ИХ Tp (2)]? of piecewise linear globally continuous 
vector fields vanishing оп Гр. 


Functions in wi p, C2), £ = 1, 2,3, are so-called Whitney forms, lowest-order 
discrete differential forms of the first family as introduced in [13] and [2, Sect. 5]. 


5.1 Discrete Regular Decompositions for Edge Elements 


Commuting projectors, also known as co-chain projectors, are the linchpin of FEEC 
theory [2, Sect. 7], and it is not different with our developments. Thus, let 


К? гы: НГ, (42) > У р, (2) 
and R} г, : Hr; (curl, 2) > W} г, (2) 
denote the continuous, boundary-aware cochain projectors from [17, Sect. 3.2.6], 


which extend the pioneering work [10] by Falk and Winther. These two linear 
operators are projectors onto their ranges, they fulfill the commuting property 


VG rP) = К, г, (УФ) Ve € НГ, (8), (16) 


and local stability estimates 


Theorem 10 ([17, Thm. 1.2]) For each vp € Wir, (92) there exists a continuous 
and piecewise linear vector field тһ € Wh rp (0), a continuous and piecewise 


linear scalar function py € УГ, (2), and a remainder Ур Е Wir, (2), all 
depending linearly on уһ, providing the discrete regular decomposition 


Vh = В rpZh + Ур +V Ри 
and satisfying the stability estimates 
112110,2 + I V pallo.2 + aloe < C |у о, о, (17) 
Il V 210,2 + lh  Nllo.g < С(|еш ул [0,2 + ду 17510.02) | (18) 


where С is а generic constant that depends only on the shape of (X2, Гр), but not on 
d(Q2), and on the shape regularity constant of T^ (2). Above, h^! is the piecewise 
constant function that is equal to h7! on every element T. 


Obviously, this is a discrete counterpart of the regular decomposition of H (curl) 
from Theorem 2. The following theorem appears to be new and it corresponds to 
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the rotation-bounded regular decomposition of Theorem 5. For the sake of brevity 
define the discrete nullspace of the curl operator 


N} = (va € УУ) p, (Q2): curl v; = 0). (19) 


If 2 and Гр have simple topology, Aj, = V we гр (2), but if the first Betti number 
of {2 is non-zero, or if Гр has multiple components, then a finite-dimensional со- 
homology space has to be added [2, Sect. 5.6]. 


Theorem 11 (Rotation-Bounded Discrete Regular Decomposition for Edge Ele- 
ments) For each уһ € Wir, (42) there exists a continuous and piecewise linear 
vector field Z} € Wh г, (2), an curl-free edge element function hj, € А}, and а 


remainder Y, € Wir, (42), all depending linearly on уһ, providing the discrete 
regular decomposition 


1 = 
Ул = К, rn + Ул + Ви 


and satisfying the stability bounds 


liza llo. |У 2,100 
1р0 t < Сул 10,2, ae < C || curl vz |0,2, 
[во ll Vallo, 2 


where С is a uniform constant that depends only on the shape of (<, Гр), but not 
on d(&2), and on the shape regularity constant of T* (£2). 


We stress that the statements of Theorems 10 and 11 do not hinge on army 
assumptions on the topological properties of 2 and Гр. 


5.2 Discrete Regular Decompositions for Face Elements 


For face elements, the construction of a boundary-aware co-chain projection 
operator 


Ву rp: Hr; (div, 2) > УУ» г, (2) 
that commutes with Rj, гр and ће curl-operator has not yet been accomplished. 
Fortunately, in the case Гр = 6, this operator is available from [10]. Thus, in the 


following, we treat only the case Гр = Ø and just omit the subscript Гр. Then, 
from [10] we can borrow a linear operator R? : H(div, 2) > w? (£2) such that 


curl Rtu = R? сайи Vu € H(curl, 2). (20) 


The next result takes Theorem 6 to the discrete setting. 
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Theorem 12 (Discrete Regular Decomposition of w? (2)) For each vector field 
vn in the lowest-order Raviart-Thomas space w? (92), there exists a continuous 
and piecewise linear vector field Z} € У (42), a vector field qn in the lowest-order 
Nédélec space wi (52), and a remainder Vp, € ws (82), all depending linearly on 
ул, providing the discrete regular decomposition 


2 ~ 
Vn = Rz; + Ул + curl qz , 


and the stability estimates 


|12, [0,2 
1 
| curl Чл [0,2 + aclldsllo.o р < C 11у о, . 
IVa По, о 
У za llo,2 1 
Е < C|| div va [0,2 + 1110, - 
А "У о, о 9(42) 


The constant C depends only on the shape of 82, but not on а(Х2), and the shape- 
regularity of T^ (42). 
Finally, we present a counterpart to the divergence-bounded regular decomposi- 


tion of Theorem 9. For convenience we introduce the space of divergence-free face 
element functions 


№? := {qn € У? (0): divqy = 0}. (21) 


Theorem 13 (Divergence-Bounded Discrete Regular Decomposition of 
w? (2)) For each vector field vp in the lowest-order Raviart-Thomas space 
w? (2), there exists a continuous and piecewise linear vector field Z} € У (0), 
an element В, т the discrete divergence-free subspace AS and a remainder 
У, € УУ; (0), all depending linearly on уһ, providing the discrete regular 
decomposition 


Уһ = Riz; + У» + hy 


and the stability estimates 


|12, [0,52 T 
У" 0.2 { < allo. » jo E iv У [0,2 . 
Vall < C |lvall < C | divvl 

Ih [0,2 lA Vallo. 


The constants C depend only on the shape of 2, but not on d(X2), and the shape 
regularity of T^ (2). 


58 К. Hiptmair and C. Pechstein 


Remark 10 The result of Theorem 13 can be viewed as an improvement of the 
decompositions in [25] which are elaborated for the case of essential boundary 
conditions on 942. 


Corollary 2 Jf the second Betti number of Q vanishes, that is, if 942 is connected, 
then hj, in Theorem 13 can be chosen as hj, = curl q; with qn € wi (92) such that 


Ур = R2z-4 V, + curl q; , 


with the bounds 


|12 llo, 2 
IVa [0,2 I V zallo,2 j 
< СПУ оо, zen < C || div vallo,a. 
|| curl qa |0,2 ll ^ Vallo. 
d(2) 19110,2 


Remark 11 The result of Corollary 2 is an improvement of [19, Lemma 5.2] which 
assumes a domain 42 that is smooth enough to allow H?-regularity of the Laplace 
problem (2-regular case, for details see [19, Sect. 3]). This lemma is used in [27] in 
a domain decomposition framework, where convex subdomains are assumed. With 
our improved version, this assumption can be weakened considerably. 
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Model Reduction by Separation (8) 
of Variables: A Comparison Between ш 
Hierarchical Model Reduction 

and Proper Generalized Decomposition 


Simona Perotto, Michele Giuliano Carlino, and Francesco Ballarin 


1 Introduction 


This paper is meant as a first attempt to compare two procedures which share 
the idea of exploiting separation of variables to perform model reduction, albeit 
with different purposes. Proper Generalized Decomposition (PGD) is essentially 
employed as a powerful tool to deal with parametric problems in several fields 
of application [3, 14, 23]. Parametrized models characterize multi-query contexts, 
such as parameter optimization, statistical analysis or inverse problems. Here, the 
computation of the solution for many different parameters demands, in general, a 
huge computational effort, and this justifies the development of model reduction 
techniques. 

For this purpose, projection-based techniques, such as Proper Orthogonal 
Decomposition (POD) or Reduced Basis methods, are widely used in the 
literature [11]. The idea is to project the discrete operators onto a reduced space so 
that the problem can be solved rapidly in the lower dimensional space. PGD adopts 
a completely different way to deal with parameters. Here, parameters are considered 
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as new independent variables of the problem, together with the standard space-time 
ones [5]. Although the dimensionality of the problem is inevitably increased, PGD 
transforms the computation of the solution for new values of the parameters into a 
plain evaluation of the reduced solution, with striking computational advantages. 

Hierarchical-Model (HiMod) reduction has been proposed to improve one- 
dimensional (1D) partial differential equation (PDE) solvers for problems defined 
in domains with a geometrically dominant direction, like slabs or pipes [6, 20]. 
The main applicative field of interest is hemodynamics, in particular the modeling 
of blood flow in patient-specific geometries. Purely 1D hemodynamic models 
completely drop the transverse dynamics, which, however may be locally important 
(e.g., in the presence of a stenosis or an aneurism). HiMod aims at providing a 
numerical tool to incorporate the transverse components of the 3D solution into 
a conceptually 1D solver. To do this, the driving idea is to discretize main and 
transverse dynamics in a different way. The latter are generally of secondary 
importance and can be described by few degrees of freedom using a spectral approx- 
imation, in combination, for instance, with a finite element (FE) discretization of the 
mainstream. 

The parametric version of HiMod (namely, HiPOD) is a more recent proposal [4, 
13]. On the other hand, PGD is not so widely employed in a non-parametric setting, 
despite its original formulation [12]. Nevertheless, for the sake of comparison, in 
this paper we consider the non-parametric as well as the parametric versions of both 
the HiMod and PGD approaches. The goal is to begin a preliminary comparative 
analysis between the two methodologies, to highlight the respective weaknesses and 
strengths. The main limit of PGD remains its inability to deal with non-Cartesian 
geometries without losing the computational benefits arising from the separability 
of the spatial coordinates. HiMod turns out to be more flexible from a geometric 
viewpoint. On the other hand, PGD turns out to be extremely effective for parametric 
problems thanks to the explicit expression of the PGD solution in terms of the 
parameters, while HiPOD can be classified as a projection-based method with all 
the associated drawbacks. In perspective, the ultimate goal is to merge HiMod with 
PGD to emphasize the good features and mitigate the intrinsic limits of the two 
methods taken alone. 


2 The HiMod Approach 


Hierarchical Model reduction proved to be an efficient and reliable method to 
deal with phenomena characterized by dominant dynamics [10]. In general, the 
computational domain itself exhibits an intrinsic directionality. We assume Q C В 
(d = 2,3) to coincide with a d-dimensional fiber bundle, Q = |J еб ip ix} ху, 
where 1p C В denotes the supporting fiber aligned with the main stream, while 
yx C R! is the transverse fiber at x € €21 p, parallel to the transverse dynamics. 
For the sake of simplicity, we identify ıp with a straight segment, (xo, x1). We 
refer to [15, 21] for the case where у is curvilinear. From a computational 
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viewpoint, the idea is to exploit a map, V : Q > Q, transforming the physical 
domain, Q, into a reference domain, Ê, and to make explicit computations in Q only. 
Typically, coincides with a rectangle in 2D, with a cylinder with circular section 
in 3D. To define V, for each x € Qıp, we introduce the map, V. : yx > Ya-1, 
from fiber y, to the reference transverse fiber, 74.1, so that the reference domain 
coincides with € = E ip ix} x Ya—1. The supporting fiber is preserved by map 
W, which modifies the lateral boundaries only. 

We consider now the (full) problem to be reduced. Due to the comparative 
purposes of the paper, we focus on a scalar elliptic equation, and, in particular, on 
the associated weak formulation, 


findu € V :а(и, v) = F(v) We У, (1) 


where У С H!(Q), a(-,-) : Vx V - R is a continuous and coercive bilinear 
form and F(-) : У -> В is a continuous linear functional. To provide the HiMod 
formulation for problem (1), we introduce the hierarchical reduced space 


т 
Vin = ivy) = У de) oe (Ws (у)). with vg € Vs хє 9р, yE «| 


К=1 
(2) 


for a modal index m Е Nt, where и сн! (€21 p) is а discrete space of dimension 
Nj, associated with a partition Th of 91р, while {Oi} ee | denotes a modal basis of 
functions orthogonal with respect to the L2? (74... )-scalar product. Index т sets the 
hierarchical level of the HiMod space, being Vm C Vm+1, for any m. Concerning 
ү, we adopt here a standard FE space, although any discrete space can be 
employed (see, e.g., [21], where an isogeometric discretization is used). Functions 
in V have to include the boundary conditions on {хо} x Ух and [xi] x ya; 
analogously, the modal functions have to take into account the boundary data along 
the horizontal sides. In Sect. 4 further comments are provided about the selection of 
the modal basis and of the modal index m. The HiMod formulation for problem (1) 
thus reads 


find unen € Vm: а(ий!МоФ, Um) = Е@т) Yum € Vm. (3) 


To ensure the well-posedness of formulation (3) and the convergence of the HiMod 
approximation, uHiMod, to the full solution, и, we endow the HiMod space with a 
conformity and a spectral approximability hypothesis, and we introduce a standard 
density assumption on the discrete space У (see [20] for all the details). 


The HiMod solution can be fully characterized by introducing a basis, {6}, 


HiMod 


for the space ps Actually, each modal coefficient, йк, of u, can be expanded 
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in terms of such a basis, so that, we obtain the modal representation 


m № 


имо (x. y) = УУ i, 0 GO) gk (У)). (4) 


k=1 1=1 


The actual unknowns of problem (3) become the m N; coefficients ты m p 
With reference to the Poisson problem, —Au = f, completed with full homo- 
geneous Dirichlet boundary data, the corresponding HiMod formulation, after 
exploiting (4) in (3) and picking vm (x, у) = 6;(x)oj (Yx(y)) with i = 1,..., № 
and j = 1,...,m, reduces to the system of mN;, 1D equations in the mNpn 
unknowns eiio 


m № 


sya [| Ол d Mi co) + POC) бй co 
1D 


К=11=1 


+ сото) 0) 4 Pe 010960; e) dx = [ f; C06; (x) ах, 
91р 


where РД?) = fo, rfi Œ, 917149 with a, b = 0,1, J = det(D' Gs ve") 
with Dz = Do(x, v. '(ў)) = Ууу, 


rie, 9) = e, Gv; (DX + D3), г (х, 9) = ФФ, 
rie G. 9) = AGD, ri o. = ev; 


with Di = Dix, V, (9) = ду. /дх, and fi) = fy, fave MW 
|J| dy. EUN associated with the transverse dynamics are lumped in the 
coefficients {Р bi; so that the HiMod system is solved on the supporting fiber, $21 p. 


Collecting йе HiMod unknowns, by mode, in the vector иН!Мої є М», such that 


HiMod _ [z ~ > > E А T 
Un = [01,1,01,2,.--,U1,N,,U2,1,---,Um1,---,Um,N, 1, (5) 


we can rewrite the HiMod system in the compact form 


Hi Mod, HiMod _ pHiMod 
A, 1Mo ay о e ce] (6) 
AHiMod є Витт, and fHiMod є "№ are the HiMod stiffness 
matrix and right-hand side, respectively, with [ff/Mod];; = fo 71 f 7 (06; (x)dx, and 
[ADM or = eds f fa pee (х). % (х)ах. According to (5), for each 
modal index j, between | and т, the nodal ш. i, takes the values 1, ‚№. 


Thus, HiMod reduction leads to solve a system of order m N}, independently of the 
dimension of the full problem (1). 


where 
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3 ThePGD Approach 


To perform PGD, we have to introduce on problem (1) a separability hypothesis 
with respect to both the spatial variables and the data [5, 22]. Thus, domain €2 C 
R? coincides with the rectangle О, x Qy if d = 2, with the parallelepiped Q, x 
Qy x Qz (total separability) or with the cylinder ©; x Qy (partial separability) if 
d = 3, for Qx, Qy, 9, C Rand Qy C R2, being y — (y, z). In the following, we 
focus on partial separability, since it is more suited to match HiMod reduction with 
PGD. Analogously, we assume that the generic problem data, d — d(x, y, z), can 
be written as d = d” (x)d* (y). The separability is inherited by ће PGD space 


m 


Wm = $ шш(х, у) = у ` шк (х) шу (у), with wf € МХ, шу € WY, x € Qx, y e Qy}, 
k=1 


(7) 


dim(W;') = № and dim(W? у = NY , associated with partitions, 77, and T, of 
€2 and Qy, respectively. In general, W7 and w? are FE spaces, although, a priori, 
any discretization can be adopted. It turns out that W,, is a tensor function space, 
being Wn = ИХ ИУС H!(Q,) & H! (Qy; Ra. 

Index m plays the same role as in the HiMod reduction, setting the level of 
detail for the reduced solution (see Sect. 4 for possible criteria to choose m). PGD 
exploits the hierarchical structure in W,, to build the generic function wm € Wm. In 
particular, ш is computed as 


where и, C H'(Q,) and ууу Є H! (Qy; 4—1) are discrete spaces, with 


m-—1 
ит Qc, y) = Wy (х)ш? (у) + } ші (х) шу (у), (8) 
К=1 
where шт апа ш} are assumed known for k = 1,...,m — 1, so that the 


enrichment functions, w}, and и}, become the actual unknowns. То provide 
the PGD formulation for the Poisson problem considered in Sect.2, we exploit 
representation (8) for the PGD approximation, urn and we pick the test function 
as X (x)Y (y), with X € И, and Y є Wi. The coupling between the unknowns, их, 
and u},, leads to a nonlinear problem, which is tackled by means of the Alternating 
Direction Strategy (ADS) [5]. The idea is to look for их and u*,, separately via a 
fixed point procedure. We introduce an auxiliary index to keep trace of the ADS 
iterations, so that, at the p-th ADS iteration we compute их? and и?” starting 
from the previous approximations, u}; and и, for g = 1,..., p — 1, following a 


two-step procedure. First, we compute иж? by identifying и}, with u} ae and by 
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selecting Y (y) — un ' in the test function. This yields, for any X є Их, 


Јо, (un^) Х'ах Soy [^ ] dy + Јо, ит’ Хах Је, (СУИ | Jay 
= Jo, f" Хах lo, fru" dy - 3 = 1 dn; (и i) X'dx lo, uyum ‘dy (9) 
SS Jo, ик Хах Јо, (ш) (иӊ” у ау, 


where the separability of f is exploited (the dependence on the independent 


variables, x and y, is ned to simplify notation). Successively, we compute u};”, 


after setting u*, to их? and choosing function X as to и? in the test function, so 
that we obtain, or апу УЕ wi : 


Jon, [(иӊ”)] ах Jo, ин” Ydy + fo, [wn" ] dx Jo, (un^) Y'dy 
= Ло, F” un dx fo, ра 1 Јо, (и i) (и т") dx fo, uy, Ydy (10) 
= Уо a. UX Um "dx lo, (ux) Y'dy. 


The algebraic GC of (9) and (10) is obtained by introducing a basis, 8, 


(2, and By = (62)5^ в} в-1› for the space W; and W7 , respectively, so that uj (а) = 
№ 1] % 
1104107 (4), иш (4) = Y, м #47707 (4), with q =x,y,s = р,р—1,] = 


M 
l,...,m — 1, and, likewise, X (x) = м X902 (x) and Ү(у) = эЛ 3805 (у). 
Thanks to these expansions and to the arbitrariness of X and Y, we can rewrite (9) 
and (10) as 


[L5 D rut ere [05^ ши rn jw" = [05^ y ee 
- SHEE [es mee [ent y eer] 
(11) 


and 


Паш) kw" ] му + ui)" mw" КУ” = Га") ее 


= || (п m”) “Kut | МУ + [ (uni?)” aru; куы), 
(12) 


. 4 ; А 
respectively, where vectors v, uf’ є R^; collect ће PGD coefficients, being 
[9], = üt. [un |; = 14* andi = 1,..., NÉ, К”, МХ Е RNi*Ni and КУ, 


jo mi 
МУ Е №№ are the stiffness and mass matrices associated with x- and y- 
variables, with [K*],, = fo. (67) (67) ах, [КУ], = fo, (65) (5) ay. [м* „= 
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fo, хорах, [MY], = Jo, O56%dy, and where № e ВМ, ff e RM, with 
[f], = fo, Рах, [P], = fo, 6; Чу, fora, 1 = 1,..., №, В,5=1,..., №. 
Systems (11) and (12) are solved at each ADS iteration, so that the computational 
effort characterizing PGD is the one associated with the solution of two systems 
of order N7 and N T , respectively, for each ADS iteration. When a certain stopping 
criterion is met (see the next section for more details), ADS procedure yields vectors 
и? and цу, which identify the enrichment functions их and TM 


4 HiMod Reduction Versus PGD 


Both HiMod reduction and PGD exploit the separation of variables and, according 
to [5], belong to the a priori approaches, since they do not rely on any solution to the 
problem at hand. Nevertheless, we can easily itemize features which distinguish the 
two techniques. The most relevant ones concern the geometry of Q2, the selection of 
the transverse basis and of the modal index, and the numerical implementation of 
the two procedures. Pros and cons of the two methods are then here highlighted. 


4.1 Domain Geometry 


HiMod reduction and PGD advance precise hypotheses on the geometry of the 
computational domain. 

According to the HiMod approach, 52 is expected to coincide with a fiber 
bundle and to be mapped into the reference domain, Q, by a sufficiently regular 
transformation. Actually, map V is assumed differentiable, while map yx is required 
to be a C!-diffeomorphism, for all x € Я1р [20]. These hypotheses introduce 
some constraints, in particular, on the lateral boundary of 52 which, e.g., cannot 
exhibit kinks. Additionally, geometries of interest in many applications, such as 
bifurcations or, more in general, networks are ruled out from the demands on үлү 
and V. An approach based on the domain decomposition technique is currently 
under investigation as a viable way to deal with such geometries. The isogeometric 
version of HiMod (i.e., the HIgaMod approach) will play a crucial role in view of 
HiMod simulations for the blood flow modeling in patient-specific geometries [21]. 

The constraints introduced by PGD on the geometry of 52 are more restrictive. 
The separability hypothesis leads to consider essentially only Cartesian domains. 
This considerably reduces the applicability of PGD to practical contexts. Some 
techniques are available in the literature to overcome this issue. For instance, in [9] 
a generic domain is embedded into a Cartesian geometry, while in [7] the authors 
introduce a parametrization map for quadrilateral domains. 
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Overall, HiMod reduction exhibits a higher geometric flexibility with respect to 
PGD, in its straightforward formulation. As discussed in Sect. 5, this limitation can 
be removed when considering a parametric setting. 


4.2 Modeling of the Transverse Dynamics 


In the HiMod expansion, y-components, фк (их (у)), are selected before starting 
the model reduction. This choice, although coherent with an a priori approach, 
introduces a constraint on the dynamics that can be described, so that hints about the 
solution trend along the transverse direction can be helpful to select a representative 
modal basis. In the original proposal of the HiMod procedure, sinusoidal functions 
are employed according to a Fourier expansion [6, 20]. This turns out to be a 
reasonable choice when Dirichlet boundary conditions are assigned on the lateral 
surface, Га = {x} x дух, of Q. Legendre polynomials, properly modified to 
include the homogeneous Dirichlet data and orthonormalized, are employed in [20] 
as an alternative to a trigonometric expansion. Nevertheless, Legendre polynomials 
require high-order quadrature rules to accurately compute coefficients i 

In [1], the concept of educated modal basis is introduced to impose generic 
boundary conditions on l'j4. The idea is to solve an auxiliary Sturm-Liouville 
eigenvalue problem on the transverse reference fiber 7/1, to build a basis which 
automatically includes the boundary values on Гы. The eigenfunctions of the 
Sturm-Liouville problem provide the modal basis. A first attempt to generalize 
the educated-HiMod reduction to three-dimensional (3D) cylindrical geometries is 
performed in [10], where the Navier-Stokes equations are hierarchically reduced to 
model the blood flow in pipes. This generalization is far from being straightforward 
due to the employment of polar coordinates. To overcome this issue, we are 
currently investigating the HIgaMod approach [21], which allows us to define the 
transverse basis as the Cartesian product of 1D modal functions, independently of 
the considered geometry. 

Additionally, we remark that any modal basis can be precomputed on the 
transverse reference fiber before performing the HiMod reduction, thanks to the 
employment of map WV. This considerably simplifies computations. 

When applying PGD, y-components are unknown as the ones associated with x. 
This leads to the nonlinear problems (9)-(10), thus loosing any advantage related 
to a precomputation of the HiMod modal basis. On the other hand, PGD does not 
constrain the transverse dynamic to follow a prescribed (e.g., sinusoidal) analytical 
shape as HiMod procedure does. The educated-Himod reduction clearly is out of 
this comparison, since the modal basis strictly depends on the problem at hand. 

Finally, we observe that HiMod modes are orthonormal with respect to the 
L? ($,4..1)-norm. This property is not ensured by PGD. 

Concerning the selection of the modal index т in (2) and (7), as a first attempt, 
both HiMod reduction and PGD resort to a trial-and-error approach, so that the 
modal index is gradually increased until a check on the accuracy of the reduced 
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solution is satisfied. For instance, in [6, 20] a qualitative investigation of the contour 
plot of the HiMod approximation drives the choice of m. Concerning PGD, the 
check on the relative enrichment 


их, му, |2 
т < тогу, (13) 
[ити llez 


is usually employed, with ТОГЕ a user-defined tolerance [5]. An automatic selection 
of index m can yield a significant improvement. In [17, 19], an adaptive procedure 
is proposed for HiMod, based on an a posteriori modeling error analysis. In 
particular, the estimator in [17] is derived in a goal-oriented setting to control 
a quantity of interest, and exploits the hierarchical structure (i.e., the inclusion 
Vm C Мһ+а, Ут, а € N+) typical of a HiMod reduction. A similar modeling 
error analysis is performed in [2] for PGD, although no adaptive algorithm is here 
set to automatically pick the reduced model. Paper [19] generalizes the a posteriori 
analysis in [17] to an unsteady setting, providing the tool to automatically select m 
together with the partition 7} along (21p and the time step. 

Finally, HiMod allows to tune the modal index along the domain 52, according 
to the local complexity of the transverse dynamics. In particular, т can be varied 
in different areas of Q or, in the presence of very localized dynamics, in correspon- 
dence with specific nodes of the partition Tp. We refer to these two variants as to 
piecewise and pointwise HiMod reduction, in contrast to a uniform approach, where 
the same number of modes is adopted everywhere [16, 18]. This flexibility in the 
choice of m is currently not available for PGD. Adaptive strategies to select the 
modal index are available for the three variants of the HiMod procedure [17, 19]. 


4.3 Computational Aspects 


From a computational viewpoint, HiMod reduction and PGD lead to completely 
different procedures. Indeed, for a fixed value of m, we have to solve the only 
system (6) of order m №, when applying HiMod, in contrast to PGD which demands 
a multiple solution of systems (11)-(12) of order N7 and №, М ‚ respectively because 
of the fixed point and the enrichment algorithms. Thus, the direct solution of a single 
system, in general of larger order, is replaced by an iterative solution of several and 
smaller systems. This heterogeneity makes a computational comparison between 
PGD and HiMod not so meaningful. We verify the reliability of the HiMod and 
PGD procedures on a common test case, by choosing in (1) V — Hg (Q) with 
Q = (0,5) x (0, 1), а(и, v) = fo [иуи · Уо + b. Vu]dQ for u = 0.24, b = 
[—5, 0], and F (v) = Jo fvdQ with f(x, y) = 50{ ехр[ - (G = 2.85)/0.075)° — 
((y = 0.5)/0.075)"] + exp [ — ((х — 3.75)/0.075)” — ((y — 0.5)/0.075)?]]. For 
both the methods, we uniformly subdivide €2; p into 285 subintervals. We set the 
PGD discretization along y as well as the PGD and the HiMod index m in order 
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Fig. 1 Qualitative comparison between а HiMod (left) and а PGD (right) approximations 


to ensure the same accuracy, TOL, on the reduced approximations with respect to a 
reference FE solution, computed on a 2500 x 500 structured mesh. In particular, for 
TOL 8.107? , we have to subdivide interval (0, 1) into 20 uniform subintervals, 
and to set m to 6 and to 9 in the PGD and the HiMod discretization, respectively. 
Sinusoidal functions are chosen for the HiMod modal basis. The ADS iterations are 
controlled in terms of the relative increment, as 

х,р YP __ už?! y.p—1 


Пили ut | 
тт T IUUD «TOL (14) 


XP XP 
lum иш 12%) 


with ТО рр = 1072. Figure 1 shows the reduced approximations (which are fully 
comparable with the FE one, here omitted). The contourplots are very similar. The 
coarse PGD y-discretization justifies the slight roughness of the PGD contourlines. 

Another distinguishing feature between HiMod and PGD is the domain dis- 
cretization. Indeed, HiMod requires only the partition 77, along €2; p, independently 
of the dimension of ©. No discretization is needed in the y-direction, although we 
have to carefully select the quadrature nodes to compute coefficients p This 
task becomes particularly challenging when dealing with polar coordinates [10]. 
With PGD to benefit of the computational advantages associated with a 1D 
discretization, we are obliged to assume the full separability of 52; actually, a partial 
separability demands а 1D partition for ©,, and a two-dimensional partition of Qy. 
As explained in Sect. 5, non-Cartesian domains require a 3D discretization of Q. 

Finally we analyze the interplay between the enrichment and the ADS iterations 
in the PGD reduction. We investigate the possible relationship between TOLrp 
in (14) and TOLg in (13), to verify if a small tolerance for the fixed point iteration 
improves the accuracy of the PGD approximation, thus reducing the number of 
enrichment steps. To do this, we adopt the same test case used above. Table 1 
gathers the number of ADS iterations, #ТТер, the number, т, of enrichment steps, 
and the CPU time! (in seconds) demanded by the PGD procedure, for two different 
values of TOLg and three different choices of TOL gp. In particular, in column #ТТер 
we specify the number of ADS iterations required by each enrichment step. As 
expected, there exists a link between the two tolerances, namely, when a higher 
accuracy constrains the fixed point iteration, a smaller number of enrichment steps 
is performed to ensure the accuracy TOLg. 


'The computations have been run on a Intel Core i5 Dual-Core CPU 2.7 GHz 8GB RAM 
MacBook. 
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Table 1 Quantitative analysis for PGD in terms of fixed point iterations and enrichment steps 


TOLg = 2.102 TOLg = 8.10? 

ЯТТЕр т CPU [s] #ІТрр т СРО [5] 
TOLpp = 107! {2,2,2} 3 0.099640 (2,2,2,2,2) 5 0.337861 
TOLpp = 10-2 {4, 3} 2 0.046756 {4, 3, 2, 2, 4} 5 0.358555 
TOLgp = 1073 {5, 5} 2 0.077958 {5, 5,2,7} 4 0.341748 


5 HiMod Reduction ара PGD for Parametrized Problems 


The actual potential of PGD becomes more evident when considering a parametric 
setting, i.e., when problem (1) is replaced by the formulation 


find u(py) € V : a(u(p), v; ш) = F(v;u) УЕ, (15) 


with и a parameter, which may represent any data of the problem, e.g., the 
coefficients of the considered PDE, the source term, a boundary value or the domain 
geometry. 

The technique adopted by PGD to deal with the parametric dependence in (15) 
is very effective. Parameter m is considered as an additional independent variable 
which varies in a domain $2, [5]. Thus, the PGD space (7) changes into the new one 


m 
Wh = | ш (х, y.) =} we (х) шу (у) ше (ш), with 
К=1 (16) 


у Y рй и 
ші E Wr, w, E Wh, ш E Wp X зуе, нев, | 


with wr a discretization of the space L? (Q и; R9), being О the length of vector и. 
Generalizing the enrichment paradigm in (8), at the m-th step of the PGD approach 
applied to problem (15) we have to compute three unknown functions, их, и, and 
ub, by picking the test function as X (x)Y (y)Z (m), with X Е W;, Ye W?, ДЕ 
wr . Functions их, иў,, ub are computed by ADS, which now coincides with a 
three-step procedure. Thus, with reference to the Poisson problem, — V · (иуи) =f 
completed with full homogeneous Dirichlet boundary conditions and for д = p, 
we first compute u” by identifying uj, and им with the previous approximations, 


иу?! and ul PTL, respectively and by selecting Y (y) Z(u) = LR in the 
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test function. This leads to a linear system which generalizes (11), namely 


EE 
(u^) мушу” к ЕС ) Кушу” 4 M* buy? 


u^ D) f | (am?) те pum se v^!) Met | 


(17) 


where M" Е RN XN; is the mass matrix associated with the parameter и, with 
[M"];; = Јо, ш fori, j = 1,..., М and By = (ty a basis for the 
space Ё, f" є ВМ k with [t" ], = So, /^ӨР ан for | =1,..., М after assuming 
the separability f = f* fY f” for the source term f, and where we employ the same 
notation as in (11)-(12) to denote vectors u^, u/;^, with w = 1,...,m — 1, s = 
р, p—1, collecting the PGD coefficients associated with the basis Зи. Analogously, 


иЎ? is computed by solving the generalization of the linear system (12) given by 


[cater 7) Mif? а) Keun? |a + | (ani?) Me un? | e us" 
2 [@”) e [0 ве — у= i [n т мии | 
а) uz | МУ + uri)" ug | КУ], 


after setting их = = ИР, ий = ub P7 ang X(x)Z(p) = uuu E P^! for the PGD 
test function. Finally, we have the additional linear system used “ compute и? 


[[ РТ кх, s^ |[(^) mu" n [ (uni?) ean? || (ahs?) куш” ]] 
мещ? = а) еа) е ее — узи | (ane?) ug | (ws?) M] 
(ш) ми] Ошу) куш || aut, 


obtained for их = we? ud, = иЎ? and by selecting X (x)Y(y) = ux? uY;? for the 
test function. From a computational viewpoint, at each ADS iteration, we have to 
solve now three linear systems of order №, NY ; Nt , respectively. 

We investigate the reliability of PGD on problem (15), for V = HÈ UT aU gown 
(©) with Q = (0,3) х (0, 1), Tin = {0} x (0, 1), Гир = (0,3) x {1}, Paown = 
(0, 3) х (0), а(и, v) = fo [иуи · Vv + b- VuldQ with = [2.5, 0]' and и the 
parameter to be varied in ©, = [1,5], F(v) = Ja fvd& with f = 1. The problem 
is completed with mixed boundary conditions, namely a homogeneous Dirichlet 
data on Гир U l'aown, the non-homogeneous Dirichlet condition, и = и with ui; = 
y(1— y), оп Ги and a homogeneous Neumann value on Гош = {3} x (0, 1). We apply 
the PGD reduction for m = 2, and we uniformly subdivide Qx, Qy, $2,,, being Ny = 
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Fig. 2 Qualitative comparison between the reference (left) and the PGD (right) solutions, for и = 
1 (top) and u = 2.5 (bottom) 


150, № = 50, № = 500. The tolerance in (14) is set to 1072. Figure 2 compares 
the PGD approximation for и = 1 and u = 2.5 with a reference full solution 
coinciding with a linear FE approximation computed on a 300 x 100 structured 
mesh. The qualitative matching between the corresponding solutions is significant. 
From a quantitative viewpoint, the L?(Q)-norm of the relative error associated with 
the PGD approximation does not vary significantly by increasing m, whereas a slight 
error reduction is detected by increasing и. 

The parametric counterpart of the HiMod reduction, known as HiPOD, merges 
HiMod with POD [4, 13]. HiPOD pursues a different goal with respect to PGD. 
Indeed, for a new value, и“, of the parameter, PGD provides an approximation for 
the full solution и(и*), while HiPOD approximates ће HiMod solution associated 
with и*. The offline/online paradigm of POD is followed also by HiPOD. The 
peculiarity is that the offline step is now performed in the HiMod setting to contain 
the computational burden typical of this stage and by relying on the good properties 
of HiMod in terms of reliability-versus-accuracy balance. Thus, we choose P 


different values, и = и; with i = 1,..., P, for parameter и, and we collect 
the HiMod approximation for the corresponding problem (15) into the response 
matrix, S = [uHiMod(y |), ШНМ (и), ..., Мо р)| € К" ХР, according to 


representation (5). Successively, we define the null-average matrix 


P 

1 i : * 
ana 5 > [u Mod д), uHiMod(,, у, ee wh Mod Cge) | є R'NixP. 
i-l 


and we apply the Singular Value Decomposition (SVD) to У, so that V = EYT, 
where Ф є RUN) x ("^\) and у є RP* are the unitary matrices of the left- and 
of the right-singular vectors of V, respectively while X = diag(o1,...,05) € 
0") ХР denotes the pseudo-diagonal matrix of the singular values of V, being 
ор > 02 >... > op > 0 апа p = min(mN;, P) [8]. The POD basis is identified 


74 S. Perotto et al. 


by the first / left singular vectors, $;, of V, so that the reduced POD space is 
Viss = span{@,,..., фу}, with dim(Vgop) = l апа / < т№. In the numerical 
assessment below, value / coincides with the smallest integer such that of < €, with 
€ а prescribed tolerance. 

The online phase of HiPOD approximates the HiMod solution to problem (15) 
for a new value, и“, of the parameter by exploiting ће POD basis instead of solving 
system (6). This is performed via a projection step. After assembling the HiMod 
stiffness matrix and right-hand side, АНОМ апа pum. associated with 


m 
the new value of the parameter, we solve the POD system of order / 


Арор(и*)ирор(и*) = Ёрор(и“), (18) 


where Арор(и*) = (Фор) An M (и) Phop and Ғрор(и*) = (Фрор) fii Mod 
(р?) denote the POD stiffness matrix and right-hand side, respectively with Phon = 
[Фу,.... Q] € 0") х1 the matrix collecting the POD basis vectors. The HiMod 
solution is thus approximated by vector ФРорчРОР (и*) Е К", i.e., after solving 
a system of order / instead of m Nj. Overall, HiPOD requires to solve P linear 
systems of order m N, during the offline phase, additionally to a system of order l 
in the online phase. 

To check the performances of HiPOD, we adopt the test case used above for 
PGD, for the same values of the parameters, и* = 1 and и“ = 2.5. The reference 
solution is the corresponding HiMod approximation computed by using т = 15 
sinusoidal functions in the y-direction, and a linear FE discretization along the 
mainstream based on a uniform subdivision of Qıp into 50 subintervals. The 
same HiMod discretization is adopted to build the response matrix. Concerning 
the HiPOD approximation, we pick P — 100 by uniformly sampling the interval 
[1,5], and we select = = 2.5. 10- P, This choice sets the dimension of the POD 
space to / — 8, so that we have to solve a system of order 8 instead of 750. The 
contour plots in Fig. 3 qualitatively compare the HiMod solution with the HiPOD 
approximation for / — 1. The correspondence between the two approximations 
is good despite a single POD mode is employed (in such a case, system (18) 


PE 


i21 124 
H'-1 4.06е-02 2.53e-06 
и“ = 2.5 2.74е-03 1.11e-07 
random 7.19е-03 2.65e-07 


126 128 
и* = 1 1.79e-09 5.58e-12 
и“ = 2.5 4.05e-10 1.21e-13 
random  2.97e-10 3.66e-13 


Fig. 3 Contour plots: comparison between the reference HiMod solution (left) and the HiPOD 
approximation with / = 1 (right), for и* = 1 (top) and иж = 2.5 (bottom). Table: relative error 
between HiMod and HiPOD solutions with respect to the L?(Q)-norm 
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reduces to a scalar equation). We do not provide the HiPOD approximations for 
1 = 8 since they qualitatively coincide with the corresponding HiMod solution. 
The left panels can be additionally compared with the FE solutions in Fig.2 to 
verify the reliability of the HiMod procedure. Finally, the table in Fig.3 gathers 
the L?(Q)-norm of the relative error between HiMod and HiPOD solutions, for four 
different POD bases and for three choices of the viscosity (1, 2.5 and the average 
over a sampling of 30 random values of jz). The error monotonically decreases for 
larger and larger values of l, independently of the choice for u. If we compare 
the values for и = 1 and for u = 2.5 (one of the endpoints and the midpoint 
of the sampling interval, respectively), we notice a higher accuracy (of about one 
order of magnitude) for the latter choice. This is rather standard in projection-based 
reduced order modeling [11]. Concerning the computational saving in terms of CPU 
time, HiPOD method requires on average O (107?)[s] to be compared with O (10) [s] 
demanded by HiMod, resulting in a speedup of 104. 

Although PGD and HiPOD are not directly comparable due to the different 
purpose they pursue, we highlight the main pros and cons of the two methods. 
The explicit dependence of the approximation on the parameters makes PGD an 
ideal tool to efficiently deal with parametric problems. For any new parameter, 
a direct evaluation yields the corresponding PGD approximation. On the other 
hand, HiPOD suffers of the drawbacks typical of the projection-based methods. The 
main bottleneck is the assembling of the HiMod arrays involved in Apop(“*) and 
fpop (c^). 

When PGD is applied to parametric problems, we recover the possibility to deal 
with any geometric domain. In such a case, a partial separability is applied to the 
problem, so that the space independent variables are kept together whereas param- 
eters are separated. This approach clearly looses the computational advantages due 
to space separability. On the contrary, HiPOD inherits the geometric flexibility of 
the HiMod reduction, without giving up the spatial dimensional reduction of the 
problem. 
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Recurrence Relations for a Family of (8) 
Orthogonal Polynomials on a Triangle n 


Sheehan Olver, Alex Townsend, and Geoffrey M. Vasil 


1 Introduction 


In 1975, Koornwinder described a general procedure for constructing multivariate 
orthogonal polynomials from univariate ones [4, $3.7.2]. The procedure allows for 
the construction of seven classes of bivariate orthogonal polynomials from Jacobi 
polynomials, some of which were previously known [9]. In this paper, we consider a 
four-parameter variant of Koornwinder's class IV polynomials (the four-parameter 
variant was not constructed by Koornwinder) defined as [2] 


раф у) = Pk b+c+d LD 0x — ра — xf pe? (-1+ 25) 


(1) 
Holes xy B^ ,D) (25), 


a 


= P% b+c 


where a,b,c > —1, n and k are integers such that n > k > 0, pe D» (x) is the 
Jacobi polynomial of degree К [7, Table 18.3.1], and Po EF is the го polynomial 
of degree К shifted to have support on (0, 1). Кооп идеи $ construction derives 
the polynomials with d — 0, which we denote by p PO The polynomials in (1) 
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are orthogonal on the right-angled triangle ((x, у): O <x « 1,0« y «1—xJ 
with respect to the weight function ша ь,с,а(Х, y) = yr еу) x)4. 

The basis р” '© has been used extensively by the spectral ая community, 
see the м. in [3]. The recurrence relations we derive can be employed to 
reduce partial differential operators to sparse matrices, enabling efficient solution of 
linear partial differential equations defined on triangles, which will be the topic of a 
future paper. This is analogous to the ultraspherical spectral method for solving 
ordinary differential equations on bounded intervals [8]. A similar idea using a 
hierarchy of Zernike polynomials, which are bivariate orthogonal polynomials on 
the unit disk, is used in [13] to develop a sparse spectral method for solving 
partial differential equations defined on the disk [13]. On the disk, polar coordinates 
allow for radially symmetric partial differential operators to be reduced to ordinary 
differential operators acting on Jacobi polynomials [13]. This simplification does 
not translate to non-radially symmetric partial differential operators on the disk, nor 
partial differential operators on the triangle. 

Several of the formulae in this paper have already be derived by directly 
employing recurrence relations satisfied by Jacobi polynomials [14]. Our approach 
via ladder operators is a more systematic study that derives previously unreported 
recurrence relations for Р >) We also hope to use ladder operators to derive 
sparse recurrence relations for multivariate orthogonal polynomials built from 
Jacobi polynomials on higher-dimensional simplices. 

Throughout this paper, the recurrence relations hold for choices of the parameters 
n, k, a, b, ү and d that make the Jacobi polynomials well-defined. Moreover, we 
take pad) (x) = 0. Also, note that orthogonal polynomials remain orthogonal after 
an alne transformation so the recurrence relations in this paper for (1) on a right- 
angled triangle can be extended to any triangle, including triangles with the corners 
permuted. 

The paper is structured as follows. In the next section, we give 12 ladder operni 
for Jacobi polynomials and use them to derive sparse recurrence relations for pia 
In Sect. 3 we give 24 ladder operatore for (1) and write down the воне Вов бр 
sparse recurrence relations for p ed) Th Sect. 4, we use the ladder operators to 
derive a collection of sparse пе 6 relations for differentiation, conversion, and 
multiplication that are satisfied by piu <). Section 5 applies these sparse recurrence 
relations to efficiently calculating Laplacians of functions on the triangle. 


2 Ladder Operators for Jacobi Polynomials 


We give 12 ordinary differential operators that increment or decrement the param- 
eters and degree pt Jacobi polynomials by zero or one. Each ladder operator maps 
PA” (х) to PP (x), where |i — n| < 1, |à — a| € 1, and |b — b| < 1. 
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Definition 1 The following operators are ladder operators for Jacobi polynomials: 


_ du 
Ги = ду 


Liu = ((1+x)a – (1 — x)byu — (1 — x?) 9% 
Lou = (a - bc n 4 Du + (1+ x) dt 
Lu = Qa + (1 хуи — (1 — x) 4 
Luu = (a b 4- n Du — (1— х) 
Liu = Qb 4 (1+ х)п)и + а — x?) 4% 


Lau = (а+хда — (1 x) b +n + D)u — (1 — x32) 9% Liu = -nu + (1+ х) 


Lsu = ((1+x)(a +n + D — (0 —2)5u — (1 — x?) $£ Liu = пи + (1-4 


Lou = bu + (1 + x) $£ Liu = au — (1 х)“. 


The notation for the ladder operators is chosen so that Lig, po? 


JE pi) are scalar multiples of pia) for 1 < s < 6. These ladder operators 
are carefully constructed to give rise to sparse recurrence relations for Jacobi 
polynomials. 


and 


Lemma 1 The ladder operators give sparse recurrence relations for Jacobi poly- 
nomials: 


LPO) = иа-+ь+ DP) ура) = 24 Pa? 


LP) = (n tatb+ 1) pth) £ pe. D) — = 2(n +a) Pe 1,6) 
3Р0) = (n +а+ь + 1) PD Li POP) = 2(п + p) PPD) 
£a Pa) = 2(n 4 peP Li P^ Б) _ =(n + nee 


.b-1) 


£5 PP = 2(n + РУ: Li pe) = n+ ayer 


— 


Lo PO”) = (n + p) pfetib-b LIBUS = Ga ape her), 


Proof The relationship for ra 1 is a formula for the derivative of р/а. D (x ) [7, 
18.9.15] and relationship L is equivalent to [7, 18.9.16]. Six more follow from 
expressing the left- and right-hand sides in terms of 2 F; functions using [7, 18.5.7] 
and the reflection formula р/а. D (x) = (-1)" pe (—х): Li and Li are equivalent 
to [7, 15.5.3], Lii is equivalent to [7, 15.5.4], £4 and Ls are equivalent to [7, 15.5.5], 
and Гс is equivalent to [7, 15.5.6]. 
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b a,b b a,b 
pisa pn pp y pem 
b+1 ---------- b«l b+1 ----¢--- 
1 
A b b r-ei 
р р | | | 4 р 
І 5 1 1 i р 
| 1 І І i | 
b-1V------- b-1 Ь-15---------- 
а-1 а а+1 a-l а-1 а а+1 


Fig.1 Illustration of the 12 ladder operators for Jacobi polynomials in Definition 1 


The relationship for £2 follows from combining [7, 18.9.5] and [7, 18.9.6]. The 
relationships for £ follows by writing 


L = L+ (п+а+ь(1 х) 


and then using [7, 18.9.5] and [7, 18.9.6]. Finally, £3 and | follow just as £2 and 
L, using the reflection formula. 


Remark 1 We note that the first-order differential operators occurring in Lemma 1 
form together an action of the Lie algebra $1(4) [1, 5, 6]. 


Figure 1 illustrates the ladder operators and how they increment or decrement the 
parameters associated to a Jacobi polynomial. 
The ladder operators can be easily adapted to the shifted Jacobi polynomials, 


denoted by POD, which are supported on (0, 1), with x in place of 1 + x, and no 


factors of 2. The corresponding recurrence relations for ре” are the same as in 
Lemma 1, except the multiplicative factors of 1 and 2 are replaced by 1. 


3 Ladder Operators for po 


The 12 ladder operators for the Jacobi polynomials in Sect. 2 allow us to derive 24 

ladder operators for рей, The ladder operators are carefully defined so that 
(a,b,e.d) г А [CER] 

they Jap Рк to a scalar multiple of Р. ; ‚ Where the new parameters in 

p are n, k, a, b, c or d, respectively, incremented or decremented by 0 or 1. 

to highlight the symmetries of the right-angled triangle and make the recurrences 


more convenient to write down, we define 


38 0. 
UC ду Ox? 


ale 


zi=l-x-y and 
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as in [14]. Now, the variables x, y, and z have the convenient property that any 
affine transformation that maps the triangle onto itself has the effect of exchanging 
the roles of x, y, and z. 


Definition 2 The following operators are ladder operators for p oue The first 
set of 12 are: 


Моли = 3% 


Мори = (k+b+c+ Пи + ух 


Мози = (k+b+c+ 1u- x3 
Mo,4u = (yc — z(b + k + Пи — уг 


+ 
Mo,su = (y(c +k + 1))и — zb — yz% My su = ти + (1 - ex) 9и 


en ди 1 __ ди 
Mo,6u = си — 275 Mo gu = bu + уўу. 


The second set of 12 are: 
Мон = тън B= ү; 
МЇ gt = (х(Е#+а+Ь+с+4+1)—а)и— x(1 — х)$# + хуйх 
Мои = (1+К+а+Ь+с+4-+2)и+ gu +x% -E 
MÌ jo = (n -- kb cd 1—xn)u — x(0 — х) х + ху$х 
Мз ои = (n Fa bcd + 2)и - (1 х) + ух 


My ои = (а + хп)и + x(1 — х) $4 = хуйх 


Maou = (х(п+а+Ь+с+4+2)—-а—п+К—1)и—х(1 х)$ + худи 


il k 8 xy à 
My gu = гхи — nu b ХЗ — р 95, Мои = nu + (1 — х)“ — уд“ 


y 
MÅ gu =x(n+a+b+c+d + 2)и – au — x(1 — х)$ + хуйх 
Mou = au + du +x% — E, 


МЕ ou = (k E b Ec d 1u- (1-х) + уйк. 


The notation for the ladder operators is chosen so that the recurrence relations in 
Theorem | are derived for Ms о (resp. Мо, 5) by applying /, ог Li to the first 
(resp. second) и polynomial in PẸ (x,y) for 1 < s < 6. Moreover, 
we know that м! oMs oP’ i ‚ Mò "Mo BU bed) М, oM "i (64). and 
Mo, М. Pos caf are Scar multiples of pë: biet) for 1 «s < 6б. 
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The ladder operators in Definition 2 correspond to 24 sparse recurrence relations 
for pus To derive these recurrences, we first express the partial derivatives of 


n 
А е) as derivatives of shifted Jacobi polynomials. 


Proposition 1 The following relationships hold: 


p(2k+b+c+d+1, 5.5) (_y i b,c,d 
pE ут — yy BO] (PE) ар, у), (2) 


1—x 


5 Qk--b--c--d--1,a) | 5 (cb й а „Б.с, 
Г Tc ann са — xy pe (4) = (k La ху vè) p c Ge ah, 
(3) 


Proof The first relationship is immediate. The second relationship follows from the 
chain-rule: 


tae (rea - xg (т®)) = Гоа - ха (12) 
-EOU — xy sco + yf @) х) (12) 


and an application of (2) to simplify the last term. 


: b,c,d 
The 24 sparse recurrence relations for Р се) 


theorem. 


are given in the following 


Theorem 1 Let t — a -- b 4- c 4- d. The first set of 12 are: 


Mo Pre? = EFDA CHD PTT Myr = EE DPA S 


Ма” = ЕС РРО My PO = OE ТО 


МР” = ABA CHIP t МР = ОР ee 


ов = К DPA et мў Pei? = рено 


(a,b,c,d) __ (a,b—1,c,d—1) 
Мо,5 Р, к = (k + ОР, 


+ (a,b,c,d) __ (a,b+1,c,d+1) 
Ns SP, к = НОР, урт 


Моб PAP = (К+ ey part ,с—1,4) МЕБ = (К+ курене). 
The second set of 12 are: 


Mage eee = (n +k+t+ nee tbe rl) 


n—l,k 
„Б.с, —Lb,c,d-1 
МЇ ES „= (п — К+ De ° 


Moo PAP = (ntk+t+2P4 


T (a,b,c,d) __ 
М 0Р, к 


(a,b,c,d+1) 


(n+k+t-—at 1) 2.6640 
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Maso Pie =(n+k+t+ De сыс) 


Mie —(n—k4 gp ш 
Mab? окре 

Mi PA = (п + appe 

Ms, o PEPP =(n+k+t—a+ уред 
Meee =(n—k+ се ш 

Ма —(n—k-4 аура ео 


Mi SPA"? = (ntk+t—at 1)р 0070), 


Proof We present the proof of Мо 1 porum =(k+b+ct+ peo. By 
the definition of m in (1), the chain rule, and the relationship in (2), we have 


é „Б.с, 5 (2k-+b+c+d+1, —1, Bb ; 
par usar odes и, 


T-x 


(4) 


КО РЕСЕ а _ ipeo uen ( y ) | 


where the last equality comes from applying £; in Definition 1. The final expression 
in (4) is equivalent to (k + b +c + DPE? а | +14) The manipulations for the 
remaining recurrence relations are similar, except with different choices of the 


operators Ls ог Lt and combinations of (2) and (3). 


А а,Ь,с 
4 Sparse Recurrence Relations for р! k ) 
> 
We can combine the ladder operators in Sect. 3 to derive sparse recurrence relations 
between р. b,c) polynomials with different parameters and their partial derivatives. 
These recurrence relations are analogous to many of the sparse recurrence relations 
for Jacobi polynomials [7, §18.9]. 


4.1 Differentiation 


The partial derivatives of Bou can be written in terms of Jacobi polynomials 


on the triangle with incremented parameters, which is analogous to a recurrence 
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relation for the derivative of a Jacobi polynomial [7, 18.9.15]. A similar recurrence 
for P/4^? can be found in [14, Prop. 4.6, 4.7, & 4.8]. 


Corollary 1 The following recurrence relations hold: 


Qk b ++ DS PAP? = (пък +а+ь c2)k bcc DPE 


H (ktb) (n+ k+b+ e+ реВ), 
(5) 
EPGP чье DEBES, © 
Qk c br cc DE PE? =—n+k+a+b+e+2)(k+b+ e41) PETTO 
+ ЕО Fk b + c 4 DPED, 
(7) 


Proof The recurrence (5) follows from the fact that (МоМо 2 + Mj Mi и = 


Qk+b+cect+ 1) 34 when 4 = 0. The relationship (6) is equivalent to the relation 
given by Мо in Theorem 1 when d = 0. Finally, (7) follows from the fact that 


(Mi,0Mo,3 — My Mi Qu = -Qk b c4 1) 4. 


The derivatives of weighted versions of pire also satisfy sparse recurrence 
relations, which are analogous to an expression for the derivative of a weighted 
Jacobi polynomial [7, 18.9.16]. 


Corollary 2 The following recurrence relations hold: 


—(2k+b+c+1)È Ga №) = xt lyze! (« +с)(п = k+ une 


++ 0 РТ), 


д a b.c p(a.b,c)\ __ a,,b—1_c—1 p(a,b—1,c—1) 
&( yz Pak )=-@+1)х y < Pratl , 


Qk- bct 1) (уре = xt! yea +b)\n—k+ PETTO 


би k+ ав) 


= 


Proof The first recurrence follows from 
(Mİ MIT o + Mo,4M6,0)u = (2k + b + c + 1) (ex — az — хи 
=—-(2k+b+et+ Dxl-*z17* 2- (x z*u). 
The second recurrence holds since 


T 1-b, l-c b 
Муди = (су — bz — угри = —Y "x О’. 
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The third recurrence is derived from the fact that 
(MÍ ЗМ o — Mo,sMo,ou = (2k +b + c + (bx — ay + хубе — Жи 


= (2k +b + cH 1x *y eB xt усы), 


4.2 Conversion 


Recurrence relations for conversion allow us to express pus in terms of Jacobi 
polynomials on the triangle with different parameters. Here, we give the recurrence 
relations that increment the parameters, which are analogues of [7, 18.9.3]. Similar 
relations can be found in [14, Prop. 4.4]. 


Corollary 3 The following recurrence relations hold: 
Qn -- a bc c 2)P4 ^? = (nc ka b c 2) РЧ ТРО 
nk bcr РТО, (8) 
QQn+a+b+c4+2)2k+b+c+ 1) pe^? 
=(n+k+a+b+c+2)(k+b+ce+ РРО 
— (n —k-a)(k t b-E c DPE + kt Qntk+b+o+ РРО 
-= (e) — k+ DPRP O, 0) 
(2n+a+b+c+2(2k+b+c+ ja 
=(n+k+a+b+c+2(k+b+c+ oe i 
—(n—ktayk+b+e+ Рр) — (Е Б) ФЕЬ с DP E 


+ (6+ b)(n — k+ РР), (10) 


Proof The recurrence relation in (8) follows from the fact that (M30 + М50)и = 
(2n -- a 4- b 4- c 4-2)u when d = 0. Since (Mo,o— Mj o)u = (2nt+a+b+c+d+2)u 


and (MG o — Мо)и = Qn+a+b+c+d+2)(1 — x)u, we obtain 
(Mo,2M2,0— Mo 2M, o — V МЎ +) 4M4,o)u = (2k+b+c+1)(2n+a+b+ct+d+2)u, 
The recurrence relation (10) immediately follows. Similarly, (9) holds since 


(Mo,3M2,0—Mo,3Mj, 9+ Mo, 5 МЎ у= Mi, 5 Ma,o)u = (2k+b+c+1)(2n+a+bt+ct+d+2)u. 
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4.3 Multiplication 


Recurrence relations for multiplication allow one to express x pue y pue ,and 
A purse in terms of a sum of Jacobi polynomials on the triangle with potentially 
different parameters. The recurrences in Corollary 4 are analogous to the recurrence 


relations for pia) found in [7, 18.9.6]. 


Corollary 4 The following recurrence relations hold: 


Qn -- a b c 2)x PE? = (n - kc a) PA 9 + (n k+ D PPS, 


(11) 


Qk  b-- c Dn +а+Ь+с+2уР@ O = e byn kb cx NPG? О 


=i nok ты о x4 Per, 


++ DG Kad br c4 Deer, 


(12) 
Qk 4- b-- c 1) Qn a b + c 2) 2Р0 = (Е-+ суз E k bc ра) 


+ (+ Dn - ka) PED — (К+ сул —k+ PERTY 


(kt D(n-- ka bd c 2) РТР. 
(13) 


Proof The recurrence relation in (11) follows from the fact that (V o + MÉ Qu = 
Qn+a+b+c+d+2)xu. Since 


(MÌ, Mio- Miss Mao + Mos Моо — Mos Mg) = (2k+b+c+1)(2n+a+b+c+d+2)yu 

holds, we find that (12) is satisfied. Finally, (13) follows from 

(MiMi) -Mi Mao-- Мод Mig — Mo4Mao)u = (2k+b+c+1)2n+a+b+c+d+2)zu. 
Combining the recurrence relations in Corollaries 3 and 4, we can derive 

expressions for xpi y us and gp bo in terms of a sum of Jacobi 

polynomials on the triangle with parameters (a,b,c). These are analogous to 


the three-term recurrence relation for Jacobi polynomials [7, 18.9.2]. Since these 
recurrence relations are long, we refer the reader to [2, рр. 80-81]. 
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44 Differential Eigenvalue Problems 


The polynomials po ) are eigenfunctions for second-order differential operators 
(see [2, (5.3.4)] aud бд, Prop. 4.11]), and the ladder operators in Sect. 3 make it 
easy to derive this fact. 


Theorem 2 The polynomial p be) 


problems: 


satisfies two second-order differential eigen- 


zy es PAS 9 HADA- -HN d; PAL? = АБ c DPQ? 


and 


x(1— х) УР; o) _ 2xyd x mb ,b,c) +y- VERG b,c) 


F(a--1—(a- bct 3)х) E PAP? e 1 (a bct 3$ Pa? 


= —n(n - a b Fc d 2) PAP (y), 
Proof The first equation follows from 
Mo MÌ Р@#©®(х, y) = kk +b d c Р P, у) 


and the second from 


Moi MÀ = (ЕБС 1) А i5 
о o Maa PO a) 


1—x 


= (14-a - k  n)G +2 + 2n) PA; (x, у). 


5 Application: Calculating Laplacians 


We can use the recurrence relationships in this paper to calculate partial derivatives 
too. Slevinsky's fast triangle transform [11] (which builds on his fast spherical 
harmonic transform [10]) as implemented in the FastTransform multithreaded C 
code [12] gives an efficient and stable routine for calculating the expansion coeffi- 
cients on the triangle in O (d? log? d) operations, where d is the polynomial degree. 
The partial derivative recurrences (see Corollary 1) show us how to calculate the 
expansion coefficients of 9р апа = op with coefficients associated to the parameters 
(a+1,b,c+1) and (a, b _ T, c+ D, respectively. Moreover, Corollary 3 informs us 
how to convert from expansions with parameters (a, b, c) to (a+1, b, с), (a, b+ 1, с) 
and (a, b, c + 1). We can combine these various recurrences relations to compute 
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Relative error of laplacian evaluated at (0.1,0.2) 


10110 


10-1156 


Relative error 


10 12-01 


10 125 _ 


1049 1045 1059 1 055 1 080 
М 


Time to calculate Laplacian coefficients 


1019- 


1095 |> 


1000 - 


10°95 - 


Execution time 


10-19 - 


transform 
setup 
apply 


1015- 


к= шеше ие үе 
М 


Fig. 2 Top: Error when evaluating the Laplacian of f(x, у) = cos(nxy/40) at (x, у) = (0.1, 0.2) 
by expanding in degree N = (и + 1)(n + 2)/2 Jacobi polynomials on the triangle with 
(a, b, c) — (0,0, 0) and using the recurrences. Bottom: Execution times of (1) the fast transform, 
(2) constructing the recurrences as 8 banded-block-banded matrices, and (3) applying the matrices 


the coefficients of the Laplacian in the basis gos in an optimal complexity of 


O(N) = O(d?) operations, where d is the polynomial degree and N = d(d + 1)/2 
is the total number of degrees of freedom. 

For example, the Laplacian of f(x,y) — cos(nxy/40) can be computed by 
first approximating f on the unit right-angled triangle to within machine precision 
by a polynomial, and then employing various recurrence relationships to calculate 
its Laplacian (Fig. 2). To do this efficiently, we store the recurrence relations 
as banded-block-banded matrices to take advantage of fast banded matrix-vector 
multiplication. 
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6 Conclusion 


We introduce ladder operators for systematically deriving sparse recurrence rela- 
tions for differentiation, conversion, and multiplication of Jacobi and orthogonal 
polynomials on the triangle. We use these recurrences to efficiently apply partial 
differential operators, in particular for calculating Laplacians. The importance of 
these relationships is that they allow general linear partial differential operators with 
polynomial coefficients to be represented as sparse operators acting on orthogonal 
polynomial expansions. This application will be the topic of a subsequent paper. 
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Bernard Haasdonk, Boumediene Hamzi, Gabriele Santin, 
and Dominik Wittwar 


1 Introduction 


Center manifold theory plays an important role in the study of the stability of 
dynamical systems when the equilibrium point is not hyperbolic. It isolates the 
complicated asymptotic behavior by locating the center manifold which is an 
invariant manifold tangent to the subspace spanned by the eigenspace of eigenvalues 
on the imaginary axis. Then, the dynamics of the original system will be essentially 
determined by the restriction of this dynamics on the center manifold since the local 
dynamic behavior “transverse” to this invariant manifold is relatively simple as it 
corresponds to the flows in the local stable (and unstable) manifolds. In practice, 
one does not compute the center manifold and its dynamics exactly since this 
requires the resolution of a quasilinear partial differential equation which is not 
easily solvable. In most cases of interest, an approximation of degree two or three 
of the solution is sufficient. Then, the reduced dynamics on the center manifold can 
be determined, its stability can be studied and then conclusions about the stability 
of the original system can be obtained [1, 3, 4, 6, 8]. 

In this article, we use greedy kernel methods to construct a data-based approx- 
imation of the center manifold. The present work is a preliminary study that is 
intended to introduce our concept and algorithm, and to test it on some examples. 
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2 Background 


We consider a large dimensional dynamical system 
х= f(x), хєр, (1) 


where f : D — R” is a continuously differentiable function over the domain 
D C В" such that 0 € D. We are interested in the study of the behavior of the 
system around an equilibrium point x € D, i.e., f(x) = 0, possibly analyzing a 
smaller dimensional system. 

Without loss of generality, we may assume that the equilibrium is x = 0, and, 
letting L — (хо, we can rewrite (1) as 


t= f(x) = Lx + М), 


with a suitable nonlinear component N, and denote as ов (ZL) the set of real parts of 
the eigenvalues of L. A classical result relates the stability of the equilibrium with 
the spectrum of L, and in particular it is known that if L has all its eigenvalues with 
negative real parts, i.e., og (L) C К.о, then the origin is asymptotically stable, and 
if L has some eigenvalues with positive real parts, then the origin is unstable. If 
instead ов (L) C В <0, the linearization fails to determine the stability properties of 
the origin, and thus the analysis of this situation requires to employ additional tools. 

In this case, we can first use a linear change of coordinates to separate the zero 
and the negative eigenvalues, i.e., we can rewrite (1) as 


x —Lix + Nı (x, y) 
y =Г2у + №(х, y) (2) 


where Lı є R*4 is such that og(L1) = {0} and Lo Е R"*" with m := n — d 
is such that og (L2) C В о. The nonlinear functions № : R^ x В” — В and 
№ : R x R” — R” are continuously differentiable. Intuitively, we expect the 
stability of the equilibrium to only depend on the nonlinear term М! (х, y). This 
intuition turns out to be correct, and indeed it can be properly formalized by means 
of the center manifold theorem. 

We start by recalling a sufficient condition for the existence of a center manifold. 


Theorem 1 ([1]) 7f № and № are twice continuously differentiable and are such 
that 


ON; М; 
№ (0,0) 20, — (0,0) = 0, — (0,0) =0, i= 1,2, 
Ox ду 


and if the eigenvalues of Lı have zero real parts, and all the eigenvalues of Lz have 
negative real parts, then there exists a neighbourhood €2 C R? of the origin 0 є Ж 
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and a center manifold h : 2 — В" for (2), i.e., y = h(x) is an invariant manifold 
for (2),! h is smooth, and 


h(0)—0, Dh(O) = 0. (3) 
Under the assumptions of this theorem, using (2) we deduce that h satisfies the PDE 
Loh(x) + №(х, h(x)) = Dh(x) (Lix + Ni (x, й(х))) , (4) 


and the following center manifold theorem ensures that there are smooth solutions 
to this PDE. Moreover, it also allows to deduce the stability of the origin of the full 
order system (2) from the stability of the origin of a reduced order system called the 
center dynamics. 


Theorem 2 (Center Manifold Theorem [1]) The equilibria x = 0, y = 0 of the 
original dynamics is locally asymptotically stable (resp. unstable) if and only if the 
equilibrium x = 0 of the center dynamics (dynamics on the center manifold) 


x = Lix + N(x, h(x)), (5) 


is locally asymptotically stable (resp. unstable). 


In particular, this result guarantees that, after solving the PDE (4), the problem 
of analyzing the stability properties of the system (2) reduces to analyzing the 
nonlinear stability of the lower dimensional system (5). This second problem is of 
smaller dimension and thus, provided the knowledge of й, the approach is attractive 
to obtain information on the system (1) via a reduced model. 

Moreover, we remark that an exact knowledge of h is not required for this 
purpose, i.e., it is sufficient to have an approximate solution of the PDE (4). Indeed, 
it is frequently sufficient to compute only the low degree terms of the Taylor series 
expansion of h around x = 0, i.e., if (-)!* is the degree k part of the Taylor series of 
h, the approximation 


h(x) © hx + А21) + АХ) +... 4 А (х) (6) 


is sufficient to obtain an approximation of the dynamics of order ef as |х| < e. 
The approximation (6) can be obtained by coefficient comparison, thus rewriting 


ТА differentiable manifold ./is said to be invariant under the flow of a vector field X if for x € M, 
F,(x) € M for small t > 0, where Е, (x) is the flow of X. 
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the PDE (4) as a set of algebraic equations as 


Lol) = jl, 
9 


21 
Lah PIG) + NENG, А!) = 569 (Lia + NP Gs i10) 


Dj [2] [2] 
Loh! x) + (No, их) = a (x) (uu + (м (х, nl(x))) ) 


We remark that this methodology is valid for parameterized dynamical systems and 
is used to study the stability of dynamical systems with bifurcations. 

Nevertheless, even this approximated knowledge of h can be difficult to obtain in 
practice for a general ODE. To overcome this limitation, and since an approximated 
knowledge of the manifold is sufficient, our goal in this paper is to find a data- 
based approximation of the center manifold. This approximation is based solely on 
the knowledge of the splitting (1) and on the numerical computation of a set of 
trajectories of the system, and it provides an approximation of which can be used 
to study the system stability. 


3 Kernel Approximation 


We want to build a surrogate model s; : R — IR" which approximates the center 
manifold h on a suitable set 2 C IR, in the sense that s; (x) © h(x) for all x є 2. 
This model is constructed in a data-based way, i.e., we assume some knowledge of 
the map Л on a finite set of input parameters, or training data. In practice, such values 
are computed from high-fidelity numerical approximations, which will be discussed 
in detail in the following. 

The surrogate is based on kernel approximation, which allows the use of scattered 
data, i.e., we do not require any grid structure on the set of training data. Moreover, 
since the unknown function h is vector-valued, we employ here matrix-valued 
kernels. Details on kernel-based approximation can be found e.g. in [9], and the 
extension to the vectorial case is detailed e.g. in [5, 10]. We recall here only that a 
positive definite matrix-valued kernel on 2 is a function К : 2 x (2 — В”"хт 
such that K (x, y) = K (y, x)? forall x, y € 2 and [К (ош, x ji e "хт is 
positive semidefinite for any set (x1,..., хм} C & of pairwise distinct points, for 
all N € N. Associated to a positive definite kernel there is a unique Hilbert space 
A of functions 2 — IR", named native space, where the kernel is reproducing, 
meaning that К (., x)o is the Riesz representer of the directional point evaluation 
8* (f) :— a f (x), for alla € R”, x € 2. 

We consider here a twice continuously differentiable matrix-valued kernel k on 
Q , and we use a specific functional formulation for our approximation and a specific 
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cost function, in order to construct a surrogate that is well suited for the particular 
approximation task. 
In detail, the approximant takes the form 


n| n) m 


вн (х) у KG хи + УУ) К, xij, 


i-i j=1 i=1 


with centers xt” e xu) = Er Я те | a є ХО = и $4 21 
and coefficient vectors о;, В; ; Е IR". Here the superscript a) denotes that the 
derivative with regards to the second kernel component is taken. 

Subsequently, we assume to have a sufficient amount of data Ху» = 
[x1,...,xw*) and Yy» = {у,..., ум} which, for example, is generated by 
running a numerical scheme to compute discrete trajectories for different initial 
values (xo, yo). For this step, we need to assume that the variable splitting (2) is 
known in advance. Note that this is not a severe restriction, as for a general ODE (1) 
the required state transformation can be determined by eigenvalue decomposition 
of L. 

Observe that we do not know if a data pair (x;, yj) lies on the center manifold, 
i.e. if у; = h(xj) holds. We only know that the data converges asymptotically to the 
center manifold as x; — 0. Thus, an interpolation-based surrogate which merely 
interpolates the data оп a given subset X С Хм» seems ill-suited for our purposes. 
Instead we consider another set of conditions to define the approximant. First, we 
still require the conditions in (3) to be satisfied by our approximation. Moreover, 
for the given subsets X = (xj,..., хм} and Y = {y1,..., ум}, we compute 
our approximant by minimizing the following functional J : #— В under the 
constraint s(0) = 0, Ds(0) = 0: 


N 
JG) := |5 е+ У (О) — у) oila) — yi). (7) 

i=l 
Here w; € В"Х” is a positive definite weight matrix. It can be shown that (7) has а 
unique minimizer sp (see [11]). In particular s; and its derivative Dsp have the form 


N41 m 

sn(x) = Y Ка, xai + Y a K, 0) В+, (8) 
i=1 i-1 
N41 m 


Dsy(x) = У` DY K(x, хр + у | DPA K (x, Oi, 


i-l i=l 
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where we set xy+1 :— 0. The coefficient vectors o;, В; can be computed by solving 


the system 
А+ИВ\ [“\ [Y 
| Br | | i (2) i (9) 
with 
А := (K Gi, x); "n. ОМ) хт) 
W := iag Cu xU A 0) € RP'O-DxmQCED 


d 
Bo (92 KG. 0) € RM N+ LD xm? 
| i 


c = (эко, 0). ERM xm 
| т 

Y Суган 0" eRe, 

2:=0є В", 


The weight matrices о; can either be chosen manually, ог а regularizing function 
г: Q — R"*" can be prescribed such that œw; = r(xj) is symmetric and positive 
definite. In our numerical examples in Sect. 4 we chose a constant regularization 
function, i.e. 


Qj = r(xi) = Мт 


for some à > 0. However, one might consider a more general approach, where the 
weight increases as the data tends to the origin, i.e. c > œj if ||x;|| < 1311. 


3.1 Greedy Approximation 


If the technique of the previous section is used as it is, the surrogate (8) is given by 
an expansion with N* terms, where N* is the number of points in the training set. 
Therefore, the model evaluation might not be efficient enough if the model is built 
using a too large dataset. Furthermore, the computation of the coefficients in (8) 
requires the solution of the linear system (9), whose size again scales with the size 
of the training set, and which can be severely ill-conditioned for non well-placed 
points. 

To mitigate both problems, we employ an algorithm that aims at selecting small 
subsets Хм, Ум of points such that the surrogate computed with these sets is a 
sufficiently good approximation of the one which uses the full sets. The algorithm 
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selects the points in a greedy way, i.e., one point at a time is selected and added to 
the current training set. In this way, it is possible to identify a good set without the 
need to solve a nearly infeasible combinatorial problem. 

The selection is performed using the P-greedy method of [2] applied to the kernel 
K, such that the set of points is selected before the computation of the surrogate. 
The number of points, and therefore the expansion size and evaluation speed, is 
depending on a prescribed target accuracy £to} > 0. For details on the method 
implementation and its convergence properties we refer to [7]. 


4 Numerical Examples 


We test now our method on three different examples. In each of them, we specify 
the setting and the parameters used to build the surrogate and visualize our 
approximation to the center manifold. Additionally, we compute the pointwise 
residual 


r(x) = Dsn (x) (Lix + Ni (х, 5. (%))) — (Losn(x) + №(х, sa (x))) , 


which measures how well the surrogate s; satisfies the ODE (4). 

In all the three examples, the greedy algorithm is used to select a suitable subset 
of the points, and in all cases the procedure is stopped with a prescribed #101. In the 
first two examples we set £;5; :— 10- P, while 2,4; :— 1010 is used in the last one. 


41 Example 1 


We consider the 2-dimensional system 


х= Lix + N(x, y) =О-+ xy 
. j (10) 
у = Lay + №(х, у) = =y + x^. 


We generate the training data by solving (10) with an implicit Euler scheme for 
initial time fo = 0, final time Т = 1000 and with the time step At = 0.1. We 
initiate the numerical procedure with initial values (хо, yo) € {-Е0.8} x {-Е0.8} and 
store the resulting data pairs in X and Y after discarding all data whose x-values are 
not contained in the neighborhood [—0.1, 0.1] which results in N* — 38,248 data 
pairs. 


We run the greedy algorithm for the kernels kı(x, y) :— (1 + ху/2)* and 


ko(x, y) — e G—»n. This results in the sets X; and Х» which contain 14 and 
6 points, respectively. The corresponding approximations 51 and 52 for the constant 
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Fig. 1 Approximations 51 1072 
and s2 of the center manifold n 
0 — 5] 
=== $2 
—0.5 
—1 
-01 —0.05 0 0.05 0.1 
Fig. 2 Residuals rı and r2 of .107? 
the center manifold 4 " 
— r2 
3 
2 
1 
0 
—0.1 —0.05 0 0.05 0.1 


regularization function г = 10~!° are plotted in Fig. 1 over the domain [—0.1, 0.1]. 


The pointwise residual is depicted in Fig. 2. 


4.2 Example 2 


We consider the 2-dimensional system 


x = Lix + N(x, y) =0—ху 


у = Loy + №(х, у) = -у+х? – 2у2. 


(41) 


2 


The training data is generated the same way as in Example 1. We again use the 
kernels kı and k2. The greedy algorithm gives sets X; and X» of size 12 and 6, 
respectively. The evaluation of the approximations s; and 52 over the neighborhood 
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Fig. 3 Approximations 51 1072 
and 52 of the center manifold 


—0.1 —0.05 0 0.05 0.1 


Fig. 4 Residuals rı and r2 of 107-7 
the center manifold A 


— 7) 


0.5 


—0.1 —0.05 0 0.05 0.1 


[—0.1, 0.1] can be seen in Fig. 3, while the respective pointwise residuals are plotted 


in Fig. 4. 


4.3 Example 3 


We consider the (2 4- 1)-dimensional system 


х = Lix + М(х, y) = | Е (2) T a (12) 


у = Lzy + №(х, y) = —y = x] — x3 + y”. 


We generate the training data in a similar fashion as before. We again use the 
implicit Euler scheme with start time f = 0, final time Т = 1000 and with time step 
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Approximation s; Approximation 52 


-107° 


—0.1 0.1 


xl 


0.1 —0.1 0.1 —0.1 


х2 


Residual rj Residual r2 


0.1 


X2 


0.1 —0.1 0.1 —0.1 


Fig. 5 Approximations s, and s? of the center manifold and corresponding residuals гү and r2 


At — 0.1. The Euler method is performed for initial data (xo, yo) € {+0.8}? and the 
resulting trajectories are stored in X and У, where only data with x € [-0.1, 0.1]? 
was considered; this leads to N* = 78,796 data pairs. We use the kernels ky (х, у) = 
(1 + x? y/2)4 and ko(x, y) = e-1—15/2. and the greedy-selected sets have the 
size 21 (for kı) and 25 (for k2), respectively. The approximations 51, 52 and their 
corresponding residuals гү and r2 computed over the domain [—0.1, 0.1]. The 
results can be seen in Fig. 5. 

We remark that in all the three experiments both kernels give comparable results 
in terms of error magnitude, and they both provide a good approximation of the 
manifold. 
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5 Conclusions 


In this paper we introduced a novel algorithm to approximate the center manifold of 
a given ODE using a data-based surrogate. 

This algorithm computes an approximation of the manifold from a set of 
numerical trajectories with different initial data. It is based on kernel methods, 
which allow the use of the scattered data generated by these simulations as 
training points. Moreover, an application-specific ansatz and cost function have been 
employed in order to enforce suitable properties on the surrogate. 

Several numerical experiments suggested that the present method can reach a 
significant accuracy, and that it has the potential to be used as an effective model 
reduction technique. It seems promising to apply this approach to high dimensional 
systems as the approximation technique straightforwardly can be extended and is 
less prone to the curse of dimensionality than grid-based approximation techniques. 
An interesting extension would consist of determining the decomposition (2) in a 
data-based fashion by suitable processing of the trajectory data. 
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An Adaptive Error Inhibiting Block A 
One-Step Method for Ordinary ER 
Differential Equations 


Jiaxi Gu and Jae-Hun Jung 


1 Introduction 


General linear methods have been extensively studied for solving ODEs. Among the 
large family of general linear methods the diagonally implicit multistage integration 
methods (DIMSIMs) in [1] are the special cases, which exhibit considerable 
potential for efficient implementation, providing the global error of the same order 
as the local truncation error. In [2], it was demonstrated that finite difference 
methods for PDEs can be constructed such that their convergence rates, or the order 
of their global errors, are higher than the order of the truncation errors. Following 
this idea, Ditkowski and Gottlieb devised the error inhibiting strategy in [3] by 
inhibiting the lowest order term in the truncation error from accumulating over time 
and thus showed that the global error of the scheme is one order higher than the local 
truncation error. The form of the error inhibiting scheme is inspired by the work of 
[7], where a block of s new step values is obtained at each step. The key idea of this 
method is to construct a coefficient matrix that has the null space where the local 
truncation error resides. 

In this work, we further improved the original error inhibiting method 
by introducing an additional free parameter used in the radial basis function 
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(RBF) approximations. The main idea of the proposed method is to adopt 
the free parameter in the reconstruction of the error inhibiting method and 
to control it for further possible error cancellations. This results in a higher 
order of convergence than the original method. One advantage is that the 
proposed method does not need any additional conditions, so it is efficient to 
implement. 

The next section will review the explicit error inhibiting block one-step method. 
In Sect. 3, we will explain the RBF interpolation. In Sect. 4, we show how the new 
method can be derived followed by Sect. 5 where numerical results are provided 
verifying that the convergence rate of the proposed method is increased by one 
order. A brief conclusion and an outline of our future research are presented in 
Sect. 6. 


2 Error Inhibiting Block One-Step Method 


Consider the initial value problem for the first-order ODE below 


и =), tod 


и(а) = ug 


(1) 


where we assume f(t, и) is uniformly Lipschitz continuous in и and continuous in 

t. We choose a value h for the step size and set t, = a + nh a discrete sequence in 

the time domain. Denote the numerical approximation of the solution u(t,) by up. 
Define the solution vector (7, by 


U, = [u 


TEE 
= : : = jh Е — 
where и = U(tn+jn/s) is the exact solution at f = tn + а for j =0,---,s—1. 


The corresponding approximation vector У, is defined as 


T 
Vn [venen DNE А 


5 


In [3], the scheme is formulated as 
Ул+1 == ОУ, (2) 
where the operator О is represented by the following 


O=A+hBf 
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and A, B є R***, There are 4 sufficient conditions imposed on the matrices A and 
B in order to be error inhibiting: 


1. rank(A) = 1. 
2. The only non-zero eigenvalue of A is 1 and its corresponding eigenvector is 


[Ter ИТ. 


чә 


. А can be diagonalized. 
4. The matrices A and B are constructed such that when the local truncation error 
is multiplied by the discrete solution operator, we have 


lOu] < O00 · ||ть||. 


This is accomplished by requiring that the leading order term of the local 
truncation error is in the eigenspace of A associated with the zero eigenvalue. 


We derive those matrices of A and B with symbolic computation. As an example 
of the derivation of the error inhibiting method, we consider the construction of the 
scheme with s — 2. The solution vector is then 
- T 
Un = [иа+1/2, ип], 
and the corresponding approximation vector is given by 


У, = [Un--1/2; vnl”. 


In order to satisfy those conditions listed above we first select 


rem | —v | | (3) 
1- оо 
which can be diagonalized as 


|. |1-vv| [|v-1 v 10 elc 
= КжК E 4) 


Then conditions 1, 2 and 3 are satisfied. Further suppose that 


by bi2 
B= Я (5) 
Is E 
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Then 


1—-vv bit bi2 | | fn+1/2 
Viti = Vith 6 
"ul | —v d " Ё bz Ía i 
where fn41/2 = f(tn+1/2, Vn+1/2) and fn = f (tr, Un). The components of Vii 
are 


Unt3/2 = (1 — v)Ungi/2 + Uus + Фи fn+1/2 + bi2 fa), 
Vn+1 = (1 — v)ungi/2 + Uus + h(boi fa+1/2 + b22 fn). 


We write each difference equation in the form of error normalized by the step 
size and then insert the exact solutions to the ODE into the difference equation. 
Expanding 4513/2, Ип+1 and ии 1/2 around = f, in Taylor series gives the local 
truncation error 


T 
Tn = (Tr+1/25 тһ), 


where 
1 / 1 HH 
Ty 1/2 = 5° = 2bii — 2612 + ци, + z6 — 4bii + v)u h 


1 
+ 18026 — 6b11 + v)ut? R? + O (h^), 
(7) 
1 / 1 ГАА 
т = 5d — 26521 — 2b», + v)u, + 3° — 4 + v)u,h 
1 
+ 18 nt ии)? + О(Һ?). (8) 


Vanishing the coefficients of the constant term and the term Л in (7) and (8), and 
equating the quotient of the coefficient of the terms h? in (7) and (8) to ==» Ше 
condition 4 is satisfied. 

Finally we have the desired scheme as in [3] 


1|—17 h |55 —17 | | fa+1/2 
Уи = = Va + — ; 9 
i JEH x | i 
and correspondingly the local truncation error is 2nd order convergent as expected 


23 |7 
fc H uh? + O(h’). (10) 
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3 RBF Interpolation 


Now we briefly explain the RBF interpolation in one dimension. Suppose that for 
a domain О С В, a data set ((xi, is is given where u; is the value of the 
unknown function u(x) at x = х; Е Q. We use the RBFs ф : Q — R defined by 
di(x) = ф(х — хи, ej), where |x — x;| is the distance between x and x; and є; is 
a free parameter. The reconstruction of a function, u(x), is then made by a linear 
combination of RBFs 


N 
I$ u(x) = 3 Ab (x — xil, €i), (11) 
i=0 


where A; are the expansion coefficients to be determined. Using the interpolation 
condition J Ж и(х;) = ui, i = 0,--- , №, we could find the expansion coefficients А; 
by solving the linear system 


$(|xo — xol, €0) Ф хо — xil. €1) ++: Ф хо — хм|, Ем) Xo ио 

$(]x1 — xol. €o) Ф(1х1 — xil €) ++- é(ox1 — хм, єм) Al qu 

(ixn — х0], є0) Ф хм — xil €D +: (xN — xy], €N) ÀN UN 
(12) 


If we choose the multiquadric RBF with all the free parameters equal, then the 
interpolation matrix, A, becomes a symmetric matrix with all diagonal entries 1, 


1 V 1-4 Е? (хо — xi? +++ V 1+ Е? (хр — хм)? 
МТ (x — хо)? 1 < y 1+ Е? (1 — хм)? 


МТ - Е? (хм — xo? J 1 Е? (хм — x1 ... 1 


(13) 


Consider the case of three equally spaced nodes xo, x1, x2 with хо < x1 < xo. 
Let h be the grid spacing. Then the linear system becomes 


1 МТ-+ e€2?h? 1 + Ae? n? Ло ио 
~v 1-4 €?h? 1 мМ1+ ел2 |-| | = |u|. (14) 
м1 + 4e2h2 у 1 + e?n? 1 Лә u2 
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By the closed-form expression for the RBF interpolant in [4], 
2 


Hue) = gy Че КАКО). (15) 
i=0 


where А; (x), a 3 x 3 matrix, is obtained by replacing the ith row of A with the row 
vector 


[VIF ех = xo? JÀÁ x8 = xy Lee - xy]. 


Differentiating the interpolant, we obtain the first-order derivative 


2 


d р = и а | 
gy 2 «69 = 2, quay ах HAO. (16) 


We then estimate the derivative of и at x = x; as we do in polynomial interpolation 
for the central difference formula: 


d м1 + 42А2 + 1 
а) МЕЕ (ua — up). (17) 
dx Ah4/ 1 + €?h? 


By employing the Taylor expansion of the quotient on the right-hand side of (17), 
we have 


d 1 h 
oy fun) = E +2 + оа?) (из — uo). (18) 


The main feature of the RBF method is that it contains a free parameter, e, which 
we could make use of to further inhibit the errors. In the following section, we will 
show that using the parameter e coupled with Л? terms, where p > 2, we can 
increase the order of local truncation error and further promote the order of global 
error by adopting the error inhibiting scheme. 


4 Construction of the Adaptive Error Inhibiting Scheme 


Following the main feature of the RBF method explained in the preceding section, 
we try to establish a similar explicit block one-step scheme that provides a higher 
order of convergence by adding one more block of the free parameters e, and є2 
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coupled with A? term. With p — 3, we have 


1I—vvu bii biz | | fa+1/2 3 | ОЕ! | | Jfa+1/2 
у, = Vah h à 19 
iin f — v 1 — le Я | Ín тк, €2 Ín zd 


We measure the one-step error normalized by the step size as in Sect. 2. Expanding 
Un+3/2, Ип+1 and un+1/2 around t = t in Taylor series again yields the local 
truncation error 


Tn = [Tr+1/2; til 
where 
1 Га 1 nH 
Tn+1/2 = 5° = 2611 = 2612 + ии, + Fi = 4b), + v)u, h + 
1 1 
(= + 4806 — би + м) һ? + 3g; (80 — 811 + ои АЗ + o(n*, 
(20) 
1 , 1 и 
Tn = 5 — 2521 — 2622 + vun + (3 — 4651 + она + 
1 1 
ези! + — (1 — 6621 + )и | 12 + — (15 — 8551 + vu? I? + o (h^). 
48 384 
Q1) 


Annihilating the first two terms in (20) and (21), and equating the quotient of the 


coefficient of the terms ИЗ in (20) and (21) to Sc we have the scheme 


1|—18 h | 64—20 Ja+1/2 3 О є fn41/2 
Уна = = У, + = h : 
DE E | CE 28 Ё 1 | | Ín T 0 €? Л 
(22 


We can easily check that the scheme (22) satisfies those four conditions in Sect. 2. 
Further annihilating the coefficients of the term Л?, we get the optimal values of e 
and e»: 


47и) 
а= (23) 
168u,, 
(3) 
9 
e (24) 


~ 224и! ` 
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Our new scheme has the truncation error 


55 |8 (4),3 4 
= —— h” + O(h’). 25 
‘n 2688 В "n и) en 


Note that in our new scheme, we need the value of и? at each step. This higher 
order derivative can be computed by repeated differentiation of the function f on 
the right-hand side of (1) twice. However, we choose to estimate the third-order 
derivative. For и’, we use the given condition from (1), ie. u'(r) = f(t, u(t)). 
For the third-order derivative иб), we employ the second-order central difference 
formula for f"(t, u(t)) at = tn as 


4( fn 1/2 +2fn — fn—1/2) 


и@) = f s tin) © 5 


(26) 
where fn+1/2, fn and f,—1/2 are given values. For this computation, no additional 
conditions are necessary. The truncation error is still third order accurate, O (19), 
as in (25), so by the error inhibiting strategy we end up with a global error that is 
O (h^), which will soon be confirmed in the following section. 

We conclude this section with a comparison of three methods. For DIMSIM of 


type 3, 
Un+2 | _ 1 |7 —3 | | + + h|9 7 | +! 
Un4-1 417-3 Un 8 | —3 —3 Ín | 


two steps v, and vn+ı are employed to update the step v,+1 and obtain the step 


Un4-2. 
For error inhibiting scheme, 


Un+3/2 _1 =17 | | ру "n 55 —17 | | fn+1/2 
Un+1 6|—17 Un 24|25 1 Л ' 


two steps v, and vn+1/2 are involved to generate the next two steps vn+1 and v4.32. 
For our method (if we utilize (26) and substitute (23), (24) for respective e, and €2 
in (22) to avoid the zero denominator), 


waa] [180 [vns] p [572496 188] | уу 
ии | —72|—180 Un + 168 201 —48 27 Л ; 
Un4-1/2 1 00| | vn-1/2 0 0 0 fn-1/2 


we use previous three steps U,—1/2, v, and vn+1/2 to evolve the next two steps Up+1 
and vn+3/2. In [5] the stability analysis has been done for the adaptive radial basis 
function methods for IVPs and it has been shown that some adaptive methods have a 
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better stability condition than the original ones. However, it seems that the adaptive 
error inhibiting method is more computationally expensive than the original one 
when the approximation of (26) is used. 


5 Numerical Results 


We start with the nonlinear first-order differential equation used in [3] 


(27) 
u(0) = 1. 


The exact solution of the example is u(t) = 1/(t 4- 1). The left figure of Fig. 1 shows 
the global errors at the time f = 1 versus N, the number of steps, in logarithmic 
scale for the type-3 DIMSIM (blue), the original error inhibiting scheme (red) and 
our proposed method (green). As seen in the figure, our proposed method is the 
most accurate among those three methods and yields high order convergence which 
is 4th order. Table 1 shows the convergence with М for (27). The type-3 DIMSIM 
yields the 2nd order accuracy, the original error inhibiting scheme yields the 3rd 
order accuracy and our proposed method yields the 4th order accuracy. 

Next we consider the following problem used in [6] where the solution changes 
rapidly between [—2, 2] 


и! = —4Ви?, t > —10 


u(—10) = 1/10001. 


(28) 


109 
10-2 
_ +. 
О О 
Е = 
ш ш 
© 5 104: 
5 5 
9 9 
о © 
E [—4—— DIMSIM3 10-6 | | —-—— DiMsiM3 = 
—e— Els —e— EIS 
—Ó3— EIS with hê —3— EIS with h? 
— — —Slope = -2 — — — Slope = -2 
10-10; |— — –ѕоре = -3 10-81 |- — - Slope =-3 
— — — Slope = -4 pP — — -— Slope = -4 hs 
10! 102 103 
М М 


Fig. 1 Global error versus N in logarithmic scale. Left: (27). Right: (28). Blue: DIMSIM 
(DIMSIM3) 2nd order. Red: error inhibiting scheme (EIS) 3rd order. Green: our proposed method 
(EIS with Л?) 4th order 
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Table 1 Global error and order of convergence for и’ = —u? with u(0) — 1 
Method Global error order 


DIMSIM ipe 
20702 
20402 
[3) [м5 [20% 
2.34Е—5 2.0105 
320 5.82E—6 2.0053 

Error inhibiting scheme 
2.89E—5 2.9118 
3.73Е—6 2.9536 
| 80 —— |474E-7 2.9763 
5.97Е—8 2.9880 
7.50Е—9 2.9940 

Error inhibiting scheme with A? term 
224E-6 3.5935 
1.64Е—7 3.7781 
| 80 —— |LIIE-8 3.8833 
7.22Е—10 3.9400 
4.61Е—11 3.9698 


The exact solution is u(t) = 1/ p + 1). The right figure of Fig. 1 shows the global 
errors at £ = 0 versus № in logarithmic scale for the type-3 DIMSIM (blue), the 
original error inhibiting method (red) and our proposed method (green). We verify 
again that our proposed method is indeed the most accurate and yields the highest 
order of convergence. Table 2 shows the convergence with N for (28). Although 
the type-3 DIMSIM does not reveal the 2nd order accuracy in the beginning, it 
eventually exhibits the order of accuracy as expected. The original error inhibiting 
scheme is 3rd order accurate and our proposed method 4th order accurate. 


6 Conclusions 


In this note, we modified and improved the original error inhibiting block one- 
step method proposed in [3] by introducing a free parameter. By exploiting the 
parameter, the local truncation error is further reduced resulting in higher order of 
the global error. It is numerically demonstrated that, with the proposed method, the 
local truncation error is of the 3rd order and the global error of the 4th order. As 
mentioned in Sect. 4, we will investigate the stability of the error inhibiting method 
and our proposed method as well as relaxing the fourth constraint in error inhibiting 
method in our future research. 
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Table 2 Global error and order of convergence for и! = —4r?u? with и(—10) = 1/10001 


Method Global error Order 


DIMSIM type-3 9.05E- 1 
400 724E-1 0.3221 


4.07Е—1 0.8293 
149E-1 1.4476 
424E—2 1.8158 
1.10E—2 1.9475 
Error inhibiting scheme с — _ 
2.80Е—2 2.7294 
EE 3.60Б—3 2.9639 
4.50Е—4 2.9965 
5.63Е—5 3.0002 
7.03E—6 3.0005 
Error inhibiting scheme with A? term |200 [1148-2 | 
Las. 8 
3.94E-5 4.0620 
2418-6 4.0307 
1.49Е—7 4.0123 
9.9189 3.9122 
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1 Introduction 


Over the past decade a number of works have appeared which exploit the unique 
properties of Hermite-Birkhoff interpolation in space to construct arbitrary- 
order discretization methods for hyperbolic [1, 2, 4-7, 10, 14, 16-19] as well 
as Schródinger [3] equations. The precise form of the interpolant in a single cell, 
which here we write in one dimension labelled f, is 


u(t) ~ Juft) e IHI, — re(t ty), (1) 
dk аки | 
gr PD = Gx 0: k=0,...,m, £=j-1,j, (2) 


where I1?"*! denotes the polynomials of degree 2m + 1. (In higher dimensions 
one uses a tensor-product cell interpolant based on vertex data consisting of mixed 
derivatives of order through m in each Cartesian coordinate.) 

In contrast, there has been little work on analogous methods for time dis- 
cretization. A recent exception is the manuscript by Liu et al. [15]. They develop 
methods for second-order semilinear hyperbolic equations using interpolants of 
the form (1)-(2) combined with a reformulation of the evolution problem using 
exact solutions of the linear part. They demonstrate excellent long-time perfor- 
mance. 

The outline of the paper is as follows. In Sect.2 we list a few properties of 
piecewise Hermite-Birkhoff interpolation. In Sect. 3 we construct the time-stepping 
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schemes and establish some basic results, with a few numerical experiments 
described in Sect. 4. 


2 Basic Properties of Hermite-Birkhoff Interpolation 


Hermite interpolants have a number of interesting properties which make them very 
attractive for the solution of differential equations; see, e.g., [2]. Here we will mainly 
use the simplest. Precisely, for f € (tj—1, tj), the Peano representation of the local 
error can be easily derived by noting that e = и — Ju solves the two point boundary 
value problem 


q?n-26 q?nt*2y аке 


2рт+2 = qaa ео t = tj—1, Íj, k=0,...m. (3) 
Thus 
q?n*2y 
e(t) = Г. Kj, з) Ods, (4) 


where the kernel K; is the Green's function for (3). Simple scaling arguments 
combined with the transformation t = t;-; + zh; then show that e = O (n7?) 
where Л; = 1; — tj-1 is the time step. А fündamental feature of piecewise 
Hermite шр is the following orthogonality property. For any functions 


v(t), w(t) 


tj dt! py qm (w = Jw) 
———1 —— (0t = 0, (5) 
р 


aa O 
1 1 
“i dim dt" 


which in particular implies that interpolation reduces the H"'*! seminorm. 


3 Time-Stepping Methods 
We begin by considering the initial value problem for a first-order system ordinary 


differential equations: 


du d 
qo 100. и) = ио, wae RY (6) 


Given a discrete time sequence tj > tj-1, j = 1,..., №, with time steps л; = 
t; — 13-1 we write down the Picard integral formulation of the time evolution over 
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a single step 
Lj 
u(tj) = u(tj-1) «f fuls), s)ds. (7) 
tj-1 


The construction of our time integration formula proceeds in three steps. We denote 
by vj the approximation to u(t;). 


1. Given vj-, and assuming for the moment that v; is known, use the differential 
equation to compute m scaled derivatives of its solution, V(t), satisfying 
Ve(te) = ve, = j, j — 1. Setting 


Fe(t) = f(Ve(t), t), (8) 
these are recursively defined by the formula 


Est as ien (9) 
= i = Ч,...т. 
dk t diit 
2. Construct the Hermite-Birkhoff interpolant of this data; that is the polynomial 
Pj-1/2(t; vj-1, vj) of degree 2m + 1, satisfying 


d* Pj-1/2 


d* V, | @ 
di (tg; vj-1, vj) = pra €=j-1,j, k-0,...,m. (10) 


3. Approximate (7) by replacing и(1) by v; and replacing the integral by a q + 1- 
point quadrature rule with f evaluated at the Hermite interpolant: 


q 
vj = 01 +h; у, we f (Pj-1/2(tj,k; Vj-1, vj). (11) 
к=0 


4. Solve (11) for vj. Note that this is a system of d nonlinear equations for any т; 
that is, unlike standard implicit Runge-Kutta methods, the size of the nonlinear 
system is independent of the order. 


We remark that we have not studied in detail the unique solvability of (11) in the 
stiff case. In our numerical experiments we used the solution at the current time step 
as an initial approximation for Newton iterations and simply accepted the solution 
to which the iterates converged. 

To emphasize the ideas we write down some specific examples of methods with 
m = 1 and m = 2 making the simplifying assumption of autonomy; that is 
f = f(u). The derivation of methods of arbitrary order is straightforward and the 
formulas can be trivially obtained using software capable of symbolic computations. 
To apply them at higher order one must evaluate higher derivatives of f, which is 
also possible using automatic differentiation tools [11]. 
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Example (m — 1) Set 


ЕЕ 
А, 
hj 


Now the interpolant Р;__1/2(7; vj—1, vj) is given by: 
3 
Pj-1/2(t; vj-1, vj) = uu (12) 
k=0 
where 
a = vj-1, а =h; f(vj-1), 
a) = 3 (vj — vj-1) —hj (27003-1) + Г), (13) 
аз = —2 (vj — vj-1) + hj (fGj-0 + Fp). 
We next introduce a quadrature rule which is exact for polynomials of degree 
3. Possible choices include the 2-point Gauss-Legendre (14) rules, or the 3-point 
Gauss-Radau (15) or Gauss-Lobatto rules. Note that by using two different rules we 


obtain a possible error indicator. Here are the two different methods used below. 
Note that the methods are identical if f is linear. 


h; 
vj = vj t « (f (Pj-1ja(a—; vj-1, vj)) + f (Pj-ij (e; ®у—1, vj))) ; 


2 
(14) 
hj 
vj = 0-1 += (8 Г (Pj-iay-; vj-1, vj)) + B- f£ (Pj-ij Qo; vj-1, vj)) 
+4f (v))), (15) 
1 4 x /6 
=== (145), нЕ 7 В+ = 16+ V6. (16) 


A time step is executed by solving the nonlinear system, (14) ог (15), for vj. 
Example (m — 2) Now we also need the second time derivative of u, 


au Е 
dt? 


“fw = J(u) f (u), (17) 
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where J (u) is the Jacobian derivative. The Hermite interpolant can now be written: 


5 
Р;—1/2(@; vj-1, vj) = Y акт”, (18) 
k=0 
where 
h2 
a =vj-1, a=hjf(vj-1), a= 3:4 Gj-0fGj-0, (19) 


h? 
аз=10 (vj—vj—1) h; (6fvj-0-4f (5) + (Зло) Јр), 


h? 
ад=—15 (vj7vj-1) +hy (8f@j-)+7F@)) + (33-0 f(;-0710p £9). 


2 
as = 6(vj-vj-1) 3h; (fj- 0 fe) gs (-70-0f(j-0J05) fj). 


Again we can now use, for example, the 3-point Gauss-Legendre or 4-point 
Gauss-Radau quadrature rules to produce the equation we must solve for vj. 


3.1 Stability and Consistency 


The consistency of the method is a straightforward consequence of its construction, 
and its linear stability properties can also be established. 


Theorem 1 Assume that the quadrature rule has positive weights and is exact for 
polynomials of degree 2m -- 1. Then the implicit Hermite method is A-stable and 
accurate of order 2m -- 2. 


Proof Assume that f is smooth and that u(t) € C?"*?(0, T). Using (4), standard 
estimates for quadrature errors, and the Picard formula (7) we find for the truncation 
error 


__ u(tj) = u(tj-1) 
J.— hj 


— X wr ГР (tj) и), (20) 
k 


1 1} 
Irj| = n J f(u(s))ds — hj » we f (Pj-1/2(tj,k3 u(tj-1), u(tj))) 
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1 
<=; М f(u(s))ds — hy à fe 
XO wr (£40) — (Ранк: ultj-1), (t) 
k 
on. (21) 


Now consider the Dahlquist test problem, f (и) = Ли. In this case all quadrature 
rules which are exact for the Hermite interpolant produce the same method. As 
interpolation is linear, we have that the coefficients of the interpolant are linear 
combinations АКАКЫ 1 апа ВЕК, К = 0,...т. The Picard integral then 
increases the powers of h jà by one so that the implicit system (11) can be rearranged 
to: 


О-у 


Ox(hjrvj = O-(hjA)vj-1 > vj = И 


Uj-1, (22) 


where Q+(hjà) are polynomials of degree т + 1. Consistency implies 


hà. О-у) 
е 


‚у ү2т+3 
EOS +0 (my). Q3) 


The only rational function of the given degree with this accuracy is the diagonal 
Padé approximant. We thus conclude that our methods are A-stable [12].o 


4 Numerical Experiments 


Our first experiments treat standard problems from the ode literature and are 
restricted to the fourth and sixth order methods described above with either Gauss- 
Legendre or Gauss-Radau quadrature. Our practical implementations employ the 
classical Aitken algorithm adapted to Hermite interpolation to directly evaluate 
Pj-1/2(tj,k, 03-1, vj) and solve (11) using Newton’s method with the Jacobian of 
the implicit system approximated by finite differences. For adaptive computations 
we 


1. Compute v; using the Gauss-Radau-based formulas, 
2. Compute a residual, pj, by substituting v; into the Gauss-Legendre-based 
formulas. 
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We then adjust the time step by the simple rule 


1/Qm43) 
tol 
һа = | — hj, (24) 
Pj 


while also imposing a minimum time step. 
Our final experiment examines the use of the method for evolving spectral 
discretizations of initial-boundary value problems for the Schródinger equation. 


4.1 Arentsorf Orbit 


We first consider the problem of computing a periodic solutions of the restricted 
three-body problem which we reformulate as a first-order system of four variables: 


ау dy» yitu у= (1-и) 

авара 
(orta) (ы-а-м+5) 

dy dy, y2 y2 

с сз. И И Ега 

qm 5 di (1— ш) и 


3/2 3/2 
(otw) (оа) 
dyi 
и = 0012277471, yı(0) = 994, ==) = y2(0) = 0, 


d 
0) = —2.01585106379082..., T = 17.06521656015796.... 


(For graphs of the solution see [13, Ch. II].) 

We note that this problem is not considered to be stiff. The main difficulty is 
a need for very small time steps when the orbits approach the singularities of 
f. However, we use it to verify convergence at the design order when (woefully 
inefficient) uniform time steps are employed and to test the utility of our naive time 
step adaptivity algorithm. 

Results for fixed (small) time steps are displayed in Table 1. We observe 
that convergence is at design order and that the results for the two quadrature 
formulas are comparable, though the fourth order Radau method is somewhat 
more accurate than Gauss-Legendre with roles reversed at sixth order. The sixth 
order methods are more accurate with larger time steps. The error is simply 
VOUT) — 900 + O2(T) = xO. 

Results for adaptive computations with т = 2 are shown in Table 2. Obviously, 
the adaptive methods lead to a very significant reduction in the number of time 
steps; an accuracy of 1077 is achieved with 264 steps of the adaptive method 
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Table 2 Time steps and 


a f Tol | Error | Steps | 
error as a function o — : T— 
tolerance for adaptive A. = Е 2 13 | 65 _ 
solutions of the Arentsorf 107 1.855) |136 _ 
problem with т = 2 1071 115—8) | 264 


van der Poly , - tol=1.0e-08 time steps - tol=1.0e-08 


Л 


-2.5 1 1 П 1 1 10° i а а 1 П 
0 2 4 6 8 10 12 0 2 4 6 8 10 12 


Fig. 1 Solution and time step history for the van der Pol oscillator with tolerance 1078 


while 35,000 uniform steps are required. Due to the sensitivity of the problem, the 
global error is much larger than the error tolerance, but is reduced in proportion to 
it. 


4.2 Van der Pol Oscillator 


Our second example is the van der Pol oscillator problem, which again we rewrite 
as a first order system: 


se! Y... 
zo s (а У: »), (25) 


_ dy 
= 10-6 =2, — (0) = 0. 
Е ‚ У(0) · 2700 


We solve up to T = 11 using the adaptive method with т = 2. We plot ће solution 
and the time step histories for a tolerance of 10~!° in Fig. 1. Note that very small 
steps are needed to resolve the fast transitions, while the problem is quite stiff in the 
regions where y is nearly constant. Plots for the other tolerances tested, 10-6 and 
10-10, are similar though the number of time steps required varies. 
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Nonlinear Schrodinger: т=5 Nonlinear Schrodinger: һ=.01 


105 


108 | 


Relative Error 
Relative Error 
\ 
\ 
1 


Fig. 2 Left: Relative errors for NLS with various time steps and т = 3. Right: Relative errors for 
NLS with h = .01 and varying т 


4.3 Schrodinger Equation 


Lastly, we apply the method to evolve a Fourier pseudospectral discretization of the 
nonlinear Schródinger equation. Precisely we consider the real problem 


ðv 9? и E. d ðw д? NC 

uia P с... ‚ о-—=— , 26 

г Әх? (2+7) о po Ur (26) 
for x € (—8, 8), t € (0,3) with periodic boundary conditions др = дш = Oat 
х = +1. We approximate the periodization of the exact solitary wave solution 


v(x, t) = V50cos (rx — st) - sech(5(x — ct)), 
w(x, t) = V/50sin (rx — st) - sech(5(x — сї)), (27) 


with c = 2л, г = л, = л? — 25. We note that the amplitude of the solitary 
wave is reduced by about 17 digits at a distance of 8 from its peak so that the 
interaction with periodic copies is negligible over the simulation time. We use 
512 Fourier modes in the computation of the derivatives and experiments show 
that this is sufficient to represent the solitary wave to machine precision. The 
implicit system was solved using Newton iterations each time step. In Fig.2 we 
present results for т = 3 (8th order) with varying time step and for m varying 
from 1 to 5 (order 4 through 12) with Л = 1072. In both cases we observe 
rapid convergence. We also tabulate the errors at the final time and calculate the 
convergence rates when m — 3 in Table 3. The results are clearly consistent with 
the design order. 
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Table 3 Relative errors for 


the Fourier pseudospectral iid mdi 
discretization of the NLS (26) B ERE Rates 0 
with solitary wave solution 3.0(—2) | 4.0(—3) I |12(-2 
Q7) 2.0(—2) |6.7(—5) 1101 |2 |5.0(—5) 
152) |496 | 9.1 |3 |18(—7) 
10-2) |18(—7) | 8.1 |4 |8.6(—10) 
7.5(—3) |22(C-8 | 73 |5 |12(-10 


6.0/ 23) |3.2(-9) | 8.7 
5.0(—3) |9.4(—10) | 6.8 


5 Conclusions and Future Work 


In conclusion, we have demonstrated that Hermite-Birkhoff interpolation can be 
used to develop singly-implicit A-stable timestepping methods of arbitrary order. A 
number of possible generalizations and improvements to the method are possible. 
These include 


1. 


2. 
. Preconditioning of the implicit system for applications to partial differen- 


Stability analysis for variable coefficient or nonlinear problems using the projec- 
tion properties (5); 
Improved time step/order adaptivity; 


tial equations such as spectral/pseudospectral discretizations of equations of 
Schródinger type (e.g. integration preconditioners [8, 9]); 


. Development of IMEX schemes combining Hermite and Taylor polynomials. 
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HPS Accelerated Spectral Solvers A 
for Time Dependent Problems: Part II, gett 
Numerical Experiments 


Tracy Babb, Per-Gunnar Martinsson, and Daniel Appelö 


1 Introduction 


In this chapter describes a highly computationally efficient solver for equations of 
the form 


9 
кас = Lule, D) + (и, x, D, x€Q,t > 0, (1) 


with initial data u(x, 0) = ио(х). Here .15 an elliptic operator acting on a fixed 
domain £2 and Л is lower order, possibly nonlinear terms. We take к to be real or 
imaginary, allowing for parabolic and Schródinger type equations. We desire the 
benefits that can be gained from an implicit solver, such as L-stability and stiff 
accuracy, which means that the computational bottleneck will be the solution of a 
sequence of elliptic equations set on £2. In situations where the elliptic equation to 
be solved is the same in each time-step, it is highly advantageous to use a direct (as 
opposed to iterative) solver. In a direct solver, an approximate solution operator to 
the elliptic equation is built once. The cost to build it is typically higher than the cost 
required for a single elliptic solve using an iterative method such as multigrid, but 
the upside is that after it has been built, each subsequent solve is very fast. In this 
chapter, we argue that a particularly efficient direct solver to use in this context is a 
method obtained by combining a multidomain spectral collocation discretization (a 
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so-called “patching method", see e.g. Ch. 5.13 in [3]) with a nested dissection type 
solver. It has recently been demonstrated [1, 7, 12] that this combined scheme, which 
we refer to as a “Hierarchial Poincaré-Steklov (HPS)" solver, can be used with very 
high local discretization orders (up to p — 20 or higher) without jeopardizing either 
speed or stability, as compared to lower order methods. 

In this chapter, we investigate the stability and accuracy that is obtained when 
combining high-order time-stepping schemes with the HPS method for solving 
elliptic equations. We restrict attention to relatively simple geometries (mostly 
rectangles). The method can without substantial difficulty be generalized to domains 
that can naturally be expressed as a union of rectangles, possibly mapped via 
curvilinear smooth parameter maps. 

A longer version of this chapter with additional details is available at [2]. Also 
note that the conclusions are deferred to Part II of this paper (same issue). 


2 The Hierarchical Poincaré-Steklov Method 


In this section, we describe a computationally efficient and highly accurate tech- 
nique for solving an elliptic PDE of the form 


[Au](x) = g(x), хє 0, 
u(x) = f(x), xer, 


(2) 


where £2 is a domain with boundary Г, and where А is a variable coefficient elliptic 
differential operator 


[Au](x) = —с11(х)[дүи](х) — 2с12 (к) 9182] (x) — ea» (х)[д5и](х) 
+ ci(x)[91u](x) + c2(x)[d2u] (x) + с(х) u(x) 


with smooth coefficients. In the present context, (2) represents an elliptic solve 
that is required in an implicit time-descretization technique of a parabolic PDE, 
as discussed in Sect. 1. For simplicity, let us temporarily suppose that the domain 42 
is rectangular; the extension to more general domains is discussed in Remark 1. 

Our ambition here is merely to provide a high level description of the method; 
for implementation details, we refer to [1, 2, 7-9, 12, 13]. 


2.1 Discretization 


We split the domain £2 into nı x пз boxes, each of size h x h. Then on each box, we 
place a p x p tensor product grid of Chebyshev nodes, as shown in Fig. 1. We use 
collocation to discretize the PDE (2). With iol 1 denoting the collocation points, 
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ооо ө о өө 0 ө ө өө 9 ө ө өө 
оо ө ө өөө ө ө 90 9 ө ө 6 
Fig. 1 The domain £2 is split into nı x n» squares, each of size л x h. In the figure, n; = 3 and 


nz = 2. Then on each box, а p x p tensor product grid of Chebyshev nodes is placed, shown for 
р = 7. At red nodes, the PDE (2) is enforced via collocation of the spectral differentiation matrix. 
At the blue nodes, we enforce continuity of the normal fluxes. Observe that the corner nodes (gray) 
are excluded from consideration 


the vector u that represents our approximation to the solution u of (2) is given simply 
by u(i) © u (xj). We then discretize (2) as follows: 


1. For each collocation node that is internal to a box (red nodes in Fig. 1), we 
enforce (2) by directly collocating the spectral differential operator supported 
on the box, as described in, e.g., Trefethen [15]. 

2. For each collocation node on an edge between two boxes (blue nodes in Fig. 1), 
we enforce that the normal fluxes across the edge be continuous. For instance, 
for a node x; on a vertical line, we enforce that ди/дх1 is continuous across the 
edge by equating the values for ди/дх obtained by spectral differentiation of 
the boxes to the left and to the right of the edge. For an edge node that lies on 
the external boundary Г, simply evaluate the normal derivative at the node, as 
obtained by spectral differentiation in the box that holds the node. 

3. АП corner nodes (gray in Fig. 1) are dropped from consideration. For an elliptic 
operator of the form (2) with с12 = 0, it turns out that these values do not 
contribute to any of the spectral derivatives on the interior nodes, which means 
that the method without corner nodes is mathematically equivalent to the method 
with corner nodes, see [5, Sec. 2.1] for details. When c12 5 0, one must in order 
to drop the corner nodes include an extrapolation operator when evaluating the 
terms involving the spectral representation of the mixed derivative 9?u/8x10x». 
This may lead to a slight drop in the order of convergence, but the difference is 
hardly noticeable in practice, and the exclusion of corner nodes greatly simplifies 
the implementation of the method. 


Since we exclude the corner nodes from consideration, the total number of nodes 
in the grid equals N — (p — 2) (p nino + nj + n2) А р? nı n2. The discretization 
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procedure described then results in an N x N matrix А. For a node i, the value of 
АС, :)U depends on what type of node i is: 


[Au](x;) for any interior (red) node, 
A(i, :)U = 1 0 for any edge node (blue) not on Г, 
ди /дп for any edge node (blue) on Г. 


This matrix А can be used to solve BVPs with a variety of different boundary 
conditions, including Dirichlet, Neumann, Robin, and periodic [12]. 

In many situations, a simple uniform mesh of the type shown in Fig.1 is 
not optimal, since the regularity in the solution may vary greatly, due to corner 
singularities, localized loads, etc. The HPS method can easily be adapted to handle 
local refinement. The essential difficulty that arises is that when boxes of different 
sizes are joined, the collocation nodes along the joint boundary will not align. It is 
demonstrated in [1, 5] that this difficulty can stably and efficiently be handled by 
incorporating local interpolation operators. 


2.2 A Hierarchical Direct Solver 


A key observation in previous work on the HPS method is that the sparse linear 
system that results from the discretization technique described in Sect.2.1 is 
particularly well suited for direct solvers, such as the well-known multifrontal 
solvers that compute an LU-factorization of a sparse matrix. The key is to minimize 
fill-in by using a so called nested dissection ordering [4, 6]. Such direct solvers 
are very powerful in a situation where a sequence of linear systems with the 
same coefficient matrix needs to be solved, since each solve is very fast once 
the coefficient matrix has been factorized. This is precisely the environment 
under consideration here. The particular advantage of combining the multidomain 
spectral collocation discretization described in Sect. 2.1 is that the time required for 
factorizing the matrix is independent of the local discretization order. As we will 
see in the numerical experiments, this enables us to attain both very high accuracy, 
and very high computational efficiency. 


Remark 1 (General Domains) For simplicity we restrict attention to rectangular 
domains in this chapter. The extension to domains that can be mapped to a union 
of rectangles via smooth coordinate maps is relatively straight-forward, since the 
method can handle variable coefficient operators [12, Sec. 6.4]. Some care must be 
exercised since singularities may arise at intersections of parameter maps, which 
may require local refinement to maintain high accuracy. 


The direct solver described exactly mimics the classical nested dissection 
method, and has the same asymptotic complexity of O(N!>) for the “build” (or 
"factorization") stage, and then O(N log М) cost for solving a system once the 
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coefficient matrix has been factorized. Storage requirements are also O(N log N). 
A more precise analysis of the complexity that takes into account the dependence 
on the order p of the local discretization shows [1] that 7Tbuiia ~ N p + N13, and 
Томе ~ N p? + N log М. 


3 Time-Stepping Methods 


For high-order time-stepping of (1), we use the so called Explicit, Singly Diagonally 
Implicit Runge—Kutta (ESDIRK) methods. These methods have a Butcher diagram 
with a constant diagonal y and are of the form 


0 0 

2y |у У 

C3 43,1 03.2 x 

Cs—1 |4s—1,1 95—12 95-13 c^ Y 

1 bi bo b3 e b y 
| by b? b3 e bsa у 


ESDIRK methods offer the advantages of stiff accuracy and L-stability. They are 
particularly attractive when used in conjunction with direct solvers since the elliptic 
solve required in each stage involves the same coefficient matrix (/ — hy.Z), where 
h is the time-step. 

In general we split the right hand side of (1) into a stiff part, F!"!, that will 
be treated implicitly using ESDIRK methods, and a part, FU, that will be treated 
explicitly (with a Butcher table denoted с, А, and b). Precisely we will use the 
Additive Runge-Kutta (ARK) methods by Carpenter and Kennedy [11], of order 3, 
4 and 5. 

We may choose to formulate the Runge-Kutta method in terms of either solving 
for slopes or solving for stage solutions. We denote these the k; formulation and the 
uj formulation, respectively. When solving for slopes the stage computation is 


5 5 
kj = Е + ci At и" + Агу aiki + At) ail"), im 1,...,5, (3) 
j=l j=l 


5 5 
I? = Е? + ci At, и" + At y aijk + At аи), i=1,...,s. (4) 
j=l j=l 


Note that the explicit nature of (4) is encoded in the fact that the elements on the 
diagonal and above in A are zero. Once the slopes have been computed the solution 
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at the next time-step is assembled as 


5 5 
и"+! = и" + Агу bj + Агу БИ". (5) 
j=! j=l 


If the method is instead formulated in terms of solving for the stage solutions the 
implicit solves take the form 


5 
и" =u" + At у, (a FG, + c; At, и") + âj FP ( +cjAt, и"), 
ј=1 
and the explicit update for и! is given by 


5 
и = u” дг УЬ ЕО + e; At, ut) + ЕЦ, + су Ат, и"). 
j=l 


The two formulations are algebraically equivalent but offer different advantages. 
For example, when working with the slopes we do not observe (see experiments 
presented in the second part of this paper) any order reduction due to time-dependent 
boundary conditions (see e.g. the analysis by Rosales et al. [14]). On the other hand 
and as discussed in some detail below, in solving for the slopes the HPS framework 
requires an additional step to enforce continuity. 

We note that it is generally preferred to solve for the slopes when implementing 
implicit Runge-Kutta methods, particularly when solving very stiff problems where 
the influence of roundoff (or solver tolerance) errors can be magnified by the 
Lipschitz constant when solving for the stages directly. 


Remark 2 The HPS method for elliptic solves was previously used in [10], which 
considered a linear hyperbolic equation 


9 
эр fun, xeQ,t > 0, 


where is а skew-Hermitian operator. The evolution of the numerical solution can 


be performed by approximating the propagator ехр(т.2) : L?(Q) > L?(Q2) viaa 
rational approximation 


M 


exp(r.Z) x у, b, (x. — Am). 


m=—M 


If application of (r.Z/.— o)! to the current solution can be reduced to the solution 
of an elliptic-type PDE it is straightforward to apply the HPS scheme to each term 
in the approximation. A drawback with this approach is that multiple operators must 
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be formed and it is also slightly more convenient to time step non-linear equations 
using the Runge-Kutta methods we use here. 


There are two modifications to the HPS algorithm that are necessitated by the use 
of ARK time integrators, we discuss these in the next two subsections. 


3.1 Neumann Data Correction in the Slope Formulation 


In the HPS algorithm the PDE is enforced on interior nodes and continuity of the 
normal derivative is enforced on the leaf boundary. Now, due to the structure of the 
update formula (5), if at some time u” has an error component in the null space of 
the operator that is used to solve for a slope k;, then this will remain throughout the 
solution process. Although this does not affect the stability of the method it may 
result in loss of relative accuracy as the solution evolves. As a concrete example 
consider the heat equation 


Ut = Uxx, X € [0, 2],1 > 0, (6) 


with the initial data u(x, 0) = 1 — |x — Ц, and with homogenous Dirichlet boundary 
conditions. We discretize this on two leaves which we denote by o and В. 

Now in the k; formulation, we solve several PDEs for the k; values and update 
the solution as 


5 
ul = и" + At Уьш. 
j=l 


Here, even though the individual slopes have continuous derivatives the kink in u” 
will be propagated to и”+!. In this particular example we would end up with the 
incorrect steady state solution u(x,t) = 1 — |x — 1|. 

Fortunately, this can easily be mitigated by adding a consistent penalization of 
the jump in the derivative of the solution during the merging of two leaves (for 
details see Section 4 in [1]). That is, if we denote the jump by [[-]] we replace the 
condition 0 = [ТА + A*]] where Tk is the derivative from the homogenous part 
and л“ is the derivative for the particular solution (of the slope) by the condition 
[[Tk +h* — Ar^! n"]] = 0. In comparison to [1] we get the slightly modified merge 
formula 


kis = (T33 Т) (T3.aki2 = T3 Ka + hy — һу“ — А (е —hy*)), 


1 
At 
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along with the modified equation for the fluxes of the particular solution on the 
parent box 


VI Tia 0 Түз о В ү! о В ki.1 
= 3 |(те,—Т equ. m 
H l 0 2. F b (T33 — T33) [- T31 | 1321 кә + 
i? ТҮ з T? T? -1 n? he 1 hee n^? 
p + TË, (T53 — T53) (hi-hj =u em )). 


Due to space we must refer to [1] for a detailed discussion of these equations. 
Briefly, h and A^? above denote the spectral derivative on each child's boundary 
for the particular solution to the PDE for k; and are already present in [1]. However, 
h'^* апал, which denote the spectral derivative of и” on the boundary from each 
child box, are new additions. 

The above initial data is of course extreme but we note that the problem persists 
for any non-polynomial initial data with the size of the (stationary) error depending 
on resolution of the simulation. We further note that the described penalization 
removes this problem without affecting the accuracy or performance of the overall 
algorithm. 


Remark 3 Although for linear constant coefficient PDE it may be possible to project 
the initial data in a way so that interior affine functions do not cause the difficulty 
above, for greater generality, we have chosen to enforce the extra penalization 
throughout the time evolution. 


Remark 4 When utilizing the и; formulation in a purely implicit problem we do not 
encounter the difficulty described above. This is because we enforce continuity of 
the derivative in и" when solving 


5—1 5—1 
ü — Aty Du" = u” + м4 > а") + л У `аув(х, + cj At), 
j=l j=l 


followed by the update u”+! 


— yh 
zs. 


3.2 Enforcing Continuity in the Explicit Stage 


The second modification is to the first explicit stage in the k; formulation. Solving a 
problem with no forcing this stage is simply 


ki = Xun). 
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When, for example, -Zis the Laplacian, we must evaluate it on all nodes on the 
interior of the physical domain. This includes the nodes on the boundary between 
two leafs where the spectral approximation to the Laplacian can be different if 
we use values from different leaves. The seemingly obvious choice, replacing 
the Laplacian on the leaf boundary by the average, leads to instability. However, 
stability can be restored if we enforce k? = un) on the interior of each leaf 
and continuity of the derivative across each leaf boundary. Algorithmically, this is 
straightforward as these are the same conditions that are enforced in the regular 
HPS algorithm, except in this case we simply have an identity equation for kı on the 
interior nodes instead of a full PDE. 

Although it is convenient to enforce continuity of the derivative using the regular 
HPS algorithm it can be done in a more efficient fashion by forming a separate 
system of equations involving only data on the leaf boundary nodes. In a single 
dimension on a discretization with n leafs this reduces the work associated with 
enforcing continuity of the derivative across leaf boundary nodes from solving n x 
(p — 1) — 1 equations for n x (p — 1) — 1 unknowns to solving a tridiagonal system 
of equations n — 1 equations for n — 1 unknowns. 

In two dimensions the system is slightly different, but if we have n x n leafs with 
p х p Chebyshev nodes on each leaf then eliminating the explicit equations for the 
interior nodes reduces the system to (p — 2) x 2n independent tridiagonal systems of 
n — 1 equations with п — 1 unknowns for a total of (p — 2) x 2n x (n — 1) equations 
with (p — 2) x 2n x (n — 1) unknowns. 

When the и; formulation is used for a fully implicit problem the intermediate 
stage values still requires us to evaluate Zi”, but this quantity only enters through 
the body load in the intermediate stage PDEs. The explicit first stage in this 
formulation is simply ит = u”. Furthermore, while we must calculate 


5 
ntl __ „п ‚ий 
и —u (Уа), 
j=l 
this is equivalent to и” since b; = as; and we simply take u”tl = yh, 
When both explicit and implicit terms are present, we proceed differently. Now, 
the values of и; look almost identical to the implicit case and we still avoid the 


problem of an explicit “solve” in ит, but we also have 


5 
utl — и" + At > bj (FUN, + cj At, ит) + ЕЙ, + c; At, и")) 
j=l 


The ESDIRK method has the property that b; = a;;, but for the explicit Runge- 
Kutta method we have b; 5 âsj. When the explicit operator F [2] does not contain 
partial derivatives we need not enforce continuity of the derivative and can simply 
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reformulate the method as 


5 
ит! — и" + At Ў (азу = ау) FP (t, T cj At, uj) 
j=l 


4 Boundary Conditions 


The above description for Runge-Kutta methods does not address how to impose 
boundary conditions for a system of ODEs resulting from a discretization of a PDE. 
In particular, the different formulations incorporate boundary conditions in slightly 
different ways. 

In this work we consider Dirichlet, Neumann, and periodic boundary conditions. 
For periodic boundary conditions the intermediate stage boundary conditions are 
enforced to be periodic for both formulations. As the k; stage values are approxi- 
mations to the time derivative of u, the imposed Dirichlet boundary conditions for 
хє Г аге k? = и, (x, tn + cj At). When solving for u; one may attempt to enforce 
boundary conditions using u; = u(x,t + cj At), x € Г. However, as demonstrated 
in part two of this series and discussed in detail in [14], this results in order reduction 
for time dependent boundary conditions. 

In the HPS algorithm, Neumann or Robin boundary conditions are mapped to 
Dirichlet boundary conditions using the linear Dirichlet to Neumann operator as 
discussed for example in [1]. 
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On the Use of Hermite Functions A 
for the Vlasov-Poisson System SS 


Lorella Fatone, Daniele Funaro, and Gianmarco Manzini 


1 Introduction 


A semi-Lagrangian spectral method has been proposed in [8] for the numerical 
approximation of the nonrelativistic Vlasov-Poisson equations, which describe 
the dynamics of a collisionless plasma of charged particles, coupled under the 
effect of their own electric field. We assume for simplicity that the development 
of the plasma is only due to electrons. Moreover, we just treat the case of a 
1D-1V distribution function, defined in a phase space consisting of the two one- 
dimensional independent variables x (space) and v (velocity). The approximation 
introduced in [8] has been initially developed and tested on Fourier-Fourier periodic 
discretizations, for both variables in the phase space. In the successive paper [9], 
the approximation in the variable v has been approached with the help of Hermite 
functions, i.e., Hermite polynomials multiplied by the Gaussian weight exp (—v?). 
Semi-Lagrangian methods for plasma physics calculations were originally pro- 
posed in [5, 18] and more recently in [6, 15, 16]. By this approach, at different times, 
the solution is approximated at the nodes of a Cartesian grid covering the space- 
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velocity domain. The solution at each space-velocity node is traced back along the 
characteristic curve originating backward from that node. In [8] a high-order Taylor 
expansion of the characteristic curves is used to trace back the solution in time, 
which is then approximated by spectral interpolation. Such a method guarantees the 
conservation of the main physical quantities (charge, mass, and momentum). 

The first attempt in using Hermite polynomials to solve the Vlasov equation dates 
back to the work [10], where the Hermite basis is used in the velocity variable to 
describe a plasma in a physical state near the thermodynamic equilibrium. Within 
this approach, exact discrete conservation laws can be constructed [7, 13, 14, 20, 
2]]. The weight function of the Hermite basis can be generalized by introducing 
a parameter o in such а way that it becomes ехр(—о021?). A proper choice of this 
parameter can significantly improve the convergence [2, 3, 19]. This fact was also 
confirmed in earlier works on plasmas physics based on Hermite spectral methods 
(see [11, 17] and more recently [4]). 

The paper is organized as follows. In Sect. 2, we present the continuous model, 
i.e., the 1D-1V Vlasov equation. In Sect. 3, we introduce the spectral approximation 
in the phase space. In Sect. 4, we present the semi-Lagrangian schemes based on an 
approximation of the characteristic curves coupled with a second-order backward 
differentiation formula (BDF). In Sect. 5, we numerically assess the performance of 
the method for a standard test case, and we show how the solution's behavior can 
be affected by the choice of a certain parameter В, acting on the location of Hermite 
weight function. 


2 The Continuous Model 


We deal with the 1D-1V Vlasov equation defined in the domain 2 = О, x В, with 
2, С К. The unknown f = f(t, x, v) denotes the probability of finding negative 
charged particles at the location x with velocity v. This is solution of the problem 


э tag EG Gp =O ге (0,71, x EN, VER. (1) 


At time г = 0 we have the initial distribution f (0, x, v) = f(x, v). The problem is 
nonlinear, since the electric field E is coupled with f. Indeed, we set 


дЕ 
ое -f еар, 0) 
дх R 


where p denotes the electron charge density. System (1)-(2) in the unknowns f and 
Е is a simplification of the Vlasov—Poisson equations in two or three dimensional 
space domains. Uniqueness of the solution is ensured by imposing that 


| E(t, x)dx = 0, which implies that [ p(t, x)dx = |Qx|, (3) 


X X 
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where |Q,| is the size of Qy. We assume periodic boundary conditions in the 
variable x and a suitable exponential decay at infinity for the variable v. After 
integration and by using the boundary constraints, we obtain the conservation of 
mass 


£ f хаха» =о. (4) 


When f and Е are smooth enough, for a sufficiently small ô > 0, the local system 
of characteristics associated with (1) is given by the curves (X (т), У (т)) solving 


dX dV 

— =-V(r), — = E(t, Х(т)), те — ô, t + ôl, (5) 
dt dt 

with the condition that (X (t), V(t)) = (x, v) when т = t. With this setting we have 
in mind that for t > 0 we proceed backward. Under suitable regularity assumptions, 
there exists a unique solution of the Vlasov-Poisson problem (1)-(2) which is 
formally obtained by propagating the initial condition along the characteristic curves 
described by (5), i.e. we have 


ft. x, v) = F(X, V (0). (6) 
where we recall that f is the initial datum. By using the first-order approximation 
X(t) 2x —w(t — t), V(t) =v+ E(t, x)(v — t), (7) 


the Vlasov equation is satisfied up to an error decaying as |t — t|, for t tending to f. 


3 Phase-Space Discretization 


We briefly recall the construction of the approximation method proposed in [8]. At 
each point of a given grid, the new value of the discrete solution is set up to be equal 
to the value obtained by going backward, by a suitably small amount, along the 
local characteristic lines. The algorithm follows from a Taylor expansion of arbitrary 
order, where the derivatives in the variable x and v are carried out with spectral 
accuracy. In particular, for the variable x we consider the domain ©, = [0, 2л[. 
Given the positive integer №, we have the equispaced nodes x; = 27i/N ,i = 
0, 1, ..., М — 1. Regarding the direction v, when M is a given positive integer, 
the nodes vj, j = 0, 1,..., M — 1, are the zeros of Hy, which is the Hermite 
polynomial of degree M. 

We introduce the polynomial Lagrangian basis functions for the x and v 
variables, that are BM (xn) = ди and ВМ) (wm) = бут, where ду; is the usual 
Kronecker symbol. We recall that Hermite functions are obtained from Hermite 
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polynomials after multiplication by the weight w(v) = е“. We also define the 
discrete spaces 


Xy = span 5^] ‚ Умм = храп B Во | i—0,1,...N—1 (8) 
i=0,1,...,N—1 Ом 
УТ: 
Any function fy,m that belongs to Ум, м can be represented as 
N-1 M-1 
fn m(x, v) = cij BY) (x) В? (v) (v), (9) 
i=0 j=0 


where the coefficients of such an expansion are given by с;; = fn,m (Xi, vj). 
In the following, the matrices а“ апа а" *? denote the s-th derivative of Bu 


evaluated at point x, and (Во) evaluated at point ут 


5 (№) а pon 
a) = SFO) аша itt) = SANI n Pon) (10) 
As a special case, we set dN) = = dni, d © — = бт}. 

Now, let us assume that the one-dimensional function Ey € Xy is known. Given 
At > 0, by taking т = t — At in formula (7), we define the new set of points Хит = 
Xn — Um At апа Ùnm = Um + EN (x5) At. To evaluate a function fw, € Умм at 
the new points (Хул, Unm ) through the coefficients с;у, we use a Taylor expansion in 
time. By omitting the terms in Af of order higher than one, we get 


B Gum) (Bj 0) Cnm) = 


Ôin 8 уто (Um) — Vm At Ô jm a (ит) + En Ga) At din a se (11) 


mj 
By substituting (11) in (9), we obtain the approximation 


N=1 M= 
2 " N), ~ M), ~ " 
FN. M Сить пт) = Cij в! (Хат) В! (бот) O (бит) 
ї=0 j=0 


= 
= 


N-1 M— 
N,1 М,1 
А Cnm O (Um) — Um (Vm) At у d Cin + En (Xn) At у E "ions 
i=0 j=0 


(12) 


which is the main building block for more advanced schemes. 
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4 Discretization of the Vlasov Equation 


Given the time instants t^ = kAt = К T/K for any integer k = 0, 1,..., К, we 
consider the approximation of the unknowns f and Е of problem (1)-(2), given by 


ET (x, v), EO S (fa. х,о), EG*,x)), xeQ, veR, (13) 


where the function IO belongs to Yy,y and the function Ey (0 belongs to Xy. 
Concerning the density ек we define 


p(x) = f. у) иб, 0) = ра, x). (14) 


(К) 


Hence, at any time step k, we express fy м in the following way 


N-1 M-1 
W(X, v) = ci? BO Qo) BEP (v)e v), (15) 
i=0 j=0 
where p = М (xi, vj). At time t = 0, we use the initial condition er = 


fO, xi, vj) = fn vj). 
Suppose that E is given at step k. According to [8], we write 


N/2 
1 А 
Е) = -Y7 - Ex sin(nx) — 5 cos(nx) | (16) 
n=1 


„(К 


where the discrete Fourier coefficients â} ) and £O п ‚п = 1,2, ..., N/2, are suitably 


related to those of py ? 

By taking т = t — At in (7), we define хит = хи — Vm At and vam = 
Um + E (x4) At. The distribution function f is expected to remain constant along 
the characteristics. The most straightforward discretization method is obtained by 


advancing the coefficients according to the approximation 


kl 

KE (хп, Um) © I s бит. (17) 
This states that the value of ДЖУ, at the grid points and time step (k + 1)Ат, 
is assumed to correspond to the previous value at time АЛЕ, recovered by going 
backwards along the characteristics. To compute ),,,, we should use EU o) 
instead of Е® (xn). However, the distance between these two quantities is of the 
order of At, so that the replacement has no practical effects on the accuracy of first- 
order methods. Between each step k and the successive one, we need to update the 
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electric field. This can be done by using the Gaussian quadrature formula in (14), so 
obtaining 


Wey (0 ^ сю 
py Qu) = >. zv 2 PE (18) 
where wj, for j = 1,..., M — 1, are the quadrature weights. Afterwards, in order 


to compute the new point-values E Е of the electric field, it is necessary to 


integrate o jd By using approximation (12) in (17), we end up with the first-order 
explicit scheme of Euler type: 


k+1 k k 
08 е mn em T At o. (19) 
where 


k k 1 
си + EN ә) ум К ла оо) 


О = qt» 


The parameter Af must satisfy a suitable CFL condition, which is obtained by 
requiring that the point (Хит, Unm) falls inside the box ]x, 1, xi 1 [X ]um- 1; Ут И. 
A straightforward way to increase the time accuracy is to use a multistep discretiza- 
tion scheme as the second-order accurate two-step BDF scheme. We have 


o> 


3 (Хот , лт), (21) 


k+1 k E = 
teu Y "(ass Um) © МОВ (Xam, ит) — 


where (Хит, бит) is the point obtained from (хи, Um) going back of one step Af 
along the characteristic lines. Similarly, the point (хит, бит) is obtained by going 
two steps back along the characteristic lines, i.e., by using 2At instead of At when 
computing хи, and Ùnm. Despite the fact that a BDF scheme is commonly presented 
as an implicit technique, in our context (f constant along the characteristics) it 
assumes the form of an explicit method. In terms of the coefficients, we end up with 
the scheme 


4 1 
k+1 k k k-1 k-1 
D = = 3 MC Me + At 9) - ra с ! +241 Ф ) 


_ 4 1 
oO ley? СА | вы Ya Dg. kD) 


=3 Cnm 3 Cnm Cim Cim 


M— 
1 
+EN (ха) Ya ” ene —су )—— |. Q2) 


mj Caj w(Vm) 
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From theoretical considerations and the experiments in [8], it turns out that the 
above method is actually second-order accurate in At. Higher order schemes 
can be obtained with similar principles. All the above schemes guarantee mass 
conservation (see (4) for the continuous case), which is a crucial physical property. 

For practical purposes, it is advisable to make the change of variable f(t, x, v) = 
p(t, x, v) exp(—v?) in ће Vlasov equation, so obtaining 


д д 
5 tes E. 5 [2-2 =0, ree. ri «ea. о є К. (23) 
Qv 


At time step К, E function p(t*, x, v) is approximated by a function pu (x, v) in 


such a way that py ee belongs to the finite dimensional space Y y, y. 

A generalization consists in ише а real parameter o and assuming that 
the weight function is w(v) = ехр(—а? 52). The approximation scheme can be 
easily adjusted by modifying nodes and weights of the Gaussian formula, through 
a multiplication by suitable constants. The difficulty in the implementation is 
practically the same, but, as observed in [9], the results are quite sensitive to the 
variation of o. 


5 Numerical Experiments 


The numerical scheme here proposed is validated in the standard two-stream 
instability benchmark test. We consider the Vlasov—Poisson problem (1)-(2) where 
we set Q, = [0, 4л [, Qy = [—5, 5]. The initial solution is given by 


fo 0) = 


1 
= Te Gam + 619) |1 + € cos (x), (24) 


where Gr(v) = е-* QG-BY and Gg(v) = e7% UY. are two Gaussians centered 
symmetrically at the points v = +В. The parameters for (24) are: а = 1/4/8, 
Е = 10-3, к = 0.5,0 = = 2, В = В = 1. 

In all the experiments that follow, we integrate up to time Т = 30 using the 
second-order BDF scheme with a suitably small time step, in order to guarantee 
stability and a good accuracy. In this way we can concentrate our attention to the 
spectral approximation in the variable x and v. A study of the convergence rate in 
time of the proposed numerical scheme can be found in [8]. First of all, in Fig. 1 we 
show the results at time T = 30 of the solution recovered by the Fourier-Fourier 
method, by choosing N = 2°, М = 2° and time step equal to At = 0.00125. 
This will be the referring figure for the successive comparisons. Besides we show 
the corresponding time evolution of la |. the first Fourier mode of the electric 


field ES in (16). The behavior of this last quantity is predicted by theoretical 
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time T = 30 
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Fig. 1 Two-stream instability test: approximated distribution function at time T = 30 obtained by 
using the Fourier-Fourier method with N = 2°, М = 2°, Ar = 0.00125, and the corresponding 


time evolution of the first Fourier mode of the electric field E M , Le. lat? | in (16) 


considerations, and the slope of the “segment” starting at T = 15 agrees with the 
expectancy [1, Chapter 5]. 

As done in [9], we perform a series of experiments using less degrees of freedom 
than those actually necessary to resolve accurately the equation. In practice, we 
set N = M = 24. In this way, we could for instance detect what happens by 
varying the parameters o and В. Of course, if we increase the number of degrees 
of freedom, the numerical solution improves and cannot be distinguished from the 
referring one shown in Fig. 1. The purpose in [9] was to check what happens by 
varying the parameter a in the Hermite weight exp (—o?v?). The conclusions are 
that the approximate solution is very sensitive to the choice of o and that there 
are values of o that perform better than others. In general these values are those 
belonging to a neighbourhood of о = 1. Moreover, in [9], we note that keeping a 
constantly equal to the value that better fits the initial datum (i.e. и = @ = 2 for (24)) 
may create instability as time increases. For such motivations, since at the moment 
a practical algorithm able to vary о in a dynamical way during the computations is 
not available, in the numerical experiments that follow we fix a = 1, while play 
with B. 

Due to the particular initial condition, we adopt a two-species decomposition of 
the Vlasov equation, where the distribution function is given by the sum of two 
electron distribution functions, i.e., f = fr + fr. These distribution functions 
refer to the two initial electron distributions, so that fg = pgGmg and fe = 
PLGL, Where pr, and pg are given polynomials. We consider the two systems of 
electrons described by the distribution functions f; and fp at the initial time as 
distinct plasma species that maintain their diversity throughout the whole numerical 
simulation. Therefore, we can split the Vlasov equation into two equations that 
are still of Vlasov type and are solvable independently, although they are coupled 
through the same electric field, which depends on the total charge density. This 
amounts to approximate two independent equations of the same type of that given 
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time T = 30 


| = | 
“ 
107 + 4 + B 
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" 
o 5 10 15 20 25 30 
time 


Fig. 2 Two-stream instability test: approximated distribution function at time T = 30 obtained 
by using the Fourier-Hermite method with № = M = 24, At = 0.01, о = 1 (left panel) and the 
corresponding time evolution of the first Fourier mode of the electric field EP. ie. ja in (16) 
(right panel) when В = 0.5 (top), В = 1 (center) and В = 1.5 (bottom) 
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in (23), respectively shifted by = В, i.e. 


“PR + (y — gy РЕ — E(t, x) E АА =0, (25) 
t Ox дъ 

др, др, E 2 | 

SPE о ву- = — Ea, х) | PE — 29? (v + В)рь | = 0. (26) 
Ot Ox дъ 


The two unknowns are then coupled through the density function as in (2). 

The plots of Fig.2 show the numerical distribution function at time Т = 30 
obtained by using the Fourier-Hermite method with N = M = 2^, At = 0.01, 
a = 1 and different values of the parameter В (i.e. В = 0.5, В = 1 and В = 1.5), 
together with the corresponding time evolution of the (log of the) first Fourier mode 
of the electric field Е, ie. |200 | in (16). 

The distribution functions presented in the left column of Fig.2 are visibly and 
significantly different depending on В, while the first Fourier mode of the electric 
field shown in the right column seems to be less affected. These differences practi- 
cally confirm that the choice of the Hermite weight functions w(v) = exp(—o? (v a 
8)?) is а crucial aspect of the method (see also [11, 12, 17, 22]). This conclusion is 
heuristic. Unfortunately, there is no space enough for a deeper quantitative analysis 
in these pages. The question deserves however further investigation. Moreover, it 
would be advisable to develop appropriate algorithms allowing for the automatic 
adjustment of both parameters œ and В during the time advancing procedure, in 
order to optimize the performance. 
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HPS Accelerated Spectral Solvers A 
for Time Dependent Problems: Part I, Chente; 
Algorithms 


Tracy Babb, Per-Gunnar Martinsson, and Daniel Appelö 


1 Introduction 


In this chapter, part two in a two part series, describes a sequence of numerical 
experiments demonstrating the performance of a highly computationally efficient 
solver for equations of the form 


д 
к; = Lule.) + gu, x D. x € Q,t > 0, (1) 


with initial data и(х, 0) = uo(x). Here is an elliptic operator acting on a fixed 
domain 42 and f is lower order, possibly nonlinear terms. We take к to be real or 
imaginary, allowing for parabolic and Schródinger type equations. 

The “Hierarchial Poincaré-Steklov (HPS)” solver has already been demonstrated 
to be a highly competitive spectrally accurate solver for elliptic problems [1, 4, 7] 
and has also been used together with a class of exponential integrators [5], to evolve 
solutions to hyperbolic differential equations. As just mentioned, the focus here is 
on differential equations in the form (1) whose discretization leads to stiff system 
of ODE that can beneficially be advanced in time using Explicit, Singly Diagonally 
Implicit Runge-Kutta (ESDIRK) methods. ESDIRK methods offer the advantages 
of stiff accuracy and L-stability and are well suited for the HPS algorithm as they 
only require a single matrix factorization. They are also easily combined with 
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explicit Runge-Kutta method leading to so called Additive Runge-Kutta (ARK) 
methods [6]. 

To this end we investigate the stability and accuracy that is obtained when 
combining high-order time-stepping schemes with the HPS method for solving 
elliptic equations. We restrict attention to relatively simple geometries (rectangles) 
but note that the method can without difficulty be generalized to domains that can 
be expressed as a union of rectangles, possibly mapped via curvilinear smooth 
parameter maps. 

The rest of this chapter is organized as follows. In Sect. 2 we present results 
illustrating that the order reduction phenomena for DIRK methods observed in [8] 
can be circumvented when formulating the time stepping in terms of slopes (with 
boundary conditions differentiated in time) rather than formulating it in terms of 
stage solutions. In Sect. 3 we present numerical results for Schródingers equation in 
two dimensions and in Sect. 4 we present numerical results for a nonlinear problem, 
viscous Burgers' equation in two dimensions. Finally, in Sect. 5 we summarize and 
conclude. For a longer description of the method we refer to thee first part of this 
paper and to [2]. 


2 Time Dependent Boundary Conditions 


This section discusses time-dependent boundary conditions within the two different 
Runge-Kutta formulations. In particular, we investigate the order reduction that has 
been documented in [8] for implicit Runge-Kutta methods and earlier in [3] for 
explicit Runge-Kutta methods. 

In this first experiment, introduced in [8], we solve the heat equation in one 
dimension 


Ut = Uxx + f (t), хє [0,2], #> 0. (2) 


We set the initial data, Dirichlet boundary conditions and the forcing f(t) so that 
exact solution is u(x,t) = cos(t). This example is designed to eliminate the effect 
of the spatial discretization, with the solution being constant in space and allows for 
the study of possible order reduction near the boundaries. 

We use the HPS scheme in space and use 32 leafs with p = 32 Chebyshev nodes 
per leaf. We apply the third, fourth, and fifth order ESDIRK methods from [6]. We 
consider solving for the intermediate solutions, ог as we refer to it below “the и; 
formulation” with the boundary condition enforced as и? = cos(t, + cj At). We 
also consider solving for the stages, which we refer to as “the k; formulation” with 
boundary conditions imposed as k? = — sin(x, tn + cj At). 

Error reduction for time dependent boundary conditions has been studied both 
in the context of explicit Runge-Kutta methods in e.g. [3] and more recently for 
implicit Runge-Kutta methods in [8]. In [8] the authors report observed orders of 
accuracy equal to two (for the solution u) for DIRK methods of order 2, 3, and 4 for 
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x10? x10? 


u(x,1) - cos(1) 


Fig. 1 The error in solving (2). Results are for a third order ESDIRK. (a) Displays the single 
step error which converges with fourth order of accuracy. (b) Displays the global error at = 1 
converging at third order. Both errors converge at one order higher than what is expected from the 
analysis in [8] 


u(x,A t) — cos(A t) 
u(x,1) - cos(1) 


Fig.2 The error in solving (2). Results are for a fifth order ESDIRK. (a) Displays the single 
step error which converges with fourth order of accuracy. (b) Displays the global error at = 1 
converging at third order. Both errors converge at one order higher than what is expected from the 
analysis in [8] but still lower than expected 


the problem (2) discretized with a finite difference method on a fine grid (the spatial 
errors are zero) using the u; formulation. 

Figures 1 and 2 show the error for the third and fifth order ESDIRK methods, 
respectively, as a function of x for a single step and at the final time t = 1. Figure 3 
shows the maximum error for the third, fourth, and fifth order methods as a function 
of time step ДЕ after a single step and at the final time t = 1. 

In general, for a method of order p we expect that the single step error decreases 
as At?*! while the global error decreases as ДЕР. However, with time dependent 
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Fig. З The maximum error (here denoted /??) in solving (2) for the third, fourth, and fifth order 
ESDIRK methods for a sequence of decreasing time steps. (a, с) are errors after one time step and 
(b, d) are the errors at time t = 1. The top row are for the и; formulation and the bottom row is for 
the k; formulation. Note that the k; formulation is free of order reduction 


boundary conditions implemented as и" = cos(t, + cj At) the results in [8] indicate 
that the rate of convergence will not exceed two for the single step or global error. 

The results for the third order method (p = 3) displayed in Fig. 1 show that the 
single step error decreases as At?*! while the global error decreases as At^, which 
is better than the results documented in [8]. However, we still see that a boundary 
layer appears to be forming, but it is of the same order as the error away from the 
boundary. The results for the fifth order method (р = 5) displayed in Fig. 2 show 
that the single step error decreases as At^ while the global error decreases as ДР, 
which is still better than the results documented in [8]. However, the boundary layer 
is giving order reduction from At?*! for the single step error and At? for the global 
error. We note that our observations differ from those in [8] but that this possibly 
can be attributed to the use of a ESDIRK method rather than a DIRK method. 

We repeat the experiment but now we use the k; formulation for Runge-Kutta 
methods and for the boundary condition we enforce k = — sin(t, + cj At). The 
intuition here is that k? is an approximation to и; at time f, + с; ДЕ and we use 
the value of и; for the boundary condition of К. Intuitively we expect that the fact 
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that we reduce the index of the system of differential algebraic equation in the u; 
formulation by differentiating the boundary conditions can restore the design order 
of accuracy. 

In the previous examples the Runge-Kutta method introduced an error on the 
interior while the solution on the boundary was exact. If the error on the boundary 
is on the same order of magnitude as the error on the interior then the error in ихх is 
of the correct order, but when the value of u is exact on the boundary it introduces 
a larger error in ихх. In the k; formulation, for each intermediate stage we find 
ихх = 0 and then ke = —sin(t, + cj At) on the interior and on the boundary. So 
at a fixed time the solution is constant in x and a boundary layer does not form. 
Additionally, the error is constant in x at any fixed time and for a method of order 
р we obtain the expected behavior where the single step error decreases as Ar?*! 
and the global error decreases as At^. 

Figure 3 shows the maximum error for the third, fourth, and fifth order methods 
as a function of time step At after a single step and at the final time t = 1. The results 
show that the methods behave exactly as we expect. The single step error behaves as 
At?*! for the third and fifth order methods and At?*? for the fourth order method. 
The fourth order method gives sixth order error in a single step because the exact 
solution is u(x, t) = cos(t), which has every other derivative equal to zero at t = 0 
and for a single step we start at t = 0. The global error behaves as At? for each 
method. 


3 Schródinger Equation 


Next we consider the Schrödinger equation for и = u(x, y, t) 


2 
ihu, = ——— Ли + V(x,yu, t -0, (х,у) € [x xr] x Dye, уг], 


2M (3) 
u(x, y, 0) 2 uo(x, y). 


Here we nondimensionalize in a way equivalent to setting М = 1,7 = 1 in the 
above equation. We choose the potential to be the harmonic potential 


pe + ›?) ! 


Ус, у) = 


юрке 


This leads to an exact solution 


—it — (242) 
u(x,y,t) Ае e 7, (4) 


where we set А = 1/,/./z and solve until t = 2л on the domain (x, y) Е [—8, 81°. 
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Fig. 4 Error in the Schrödinger equation as a function of leaf size. The exact solution is given in 
Eq. (4) 


Table 1 Estimated rates of E | 4 6 | 8 T; Г? 
convergence for different 


Runge-Kutta methods and ESDIRK3 |2.59 |5.73 |7.72 |9.69 | 11.47 
different orders of ESDIRK4 |1.89 |6.47 |7.82 |9.76 |11.69 - 
approximation ESDIRK5 |1.84 |4.42 |7.69 |9.71 |11.48 


The computational domain is subdivided into ny x ny panels with p x p points 
on each panel. To begin, we study the order of accuracy with respect to leaf size. To 
eliminate the effect of time-stepping errors we scale At = АР/48К, where дрк is the 
order of the Runge-Kutta method. In Fig. 4 we display the errors as a function of the 
leaf size for p = 4, 6, 8, 10, 12, 16 and for the third and fifth order Runge-Kutta 
methods (кк = 3,5). The rates of convergence are found for all three Runge- 
Kutta methods and summarized in Table 1. As can be seen from the table, p — 4 
appears to converge at second order, while for higher p we generally observe a rate 
of convergence approaching to p. 

In this problem the efficiency of the method is limited by the order of the 
Runge-Kutta methods. However, as our methods are unconditionally stable we 
may enhance the efficiency by using Richardson extrapolation to achieve a highly 
accurate solution in time. We solve the same problem, but now we fix p — 12 
and take 5 · 2" time steps, with n = 0, 1,..., 5. For the third order ESDIRK 
method we use 60 x 60 leaf boxes. For the fourth order ESDIRK method we use 
90 x 90 leaf boxes. For the fifth order ESDIRK method we use 120 x 120 leaf boxes. 
Table 2 shows that we can easily achieve much higher accuracy by using Richardson 
extrapolation. 

Finally, we solve a problem without an analytic solution. In this problem the 
initial data 


u(x, y, t) = З sin(x) ѕіп(у)е  *», 
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Table 2 Estimated errors at the final time after Richardson extrapolation 
акк! 


extrapolations NEGERI 6 


з 140—5 
à 1.20 11) 
s 508 CID 


The notation d(— p) means d - 10? 


Table 3 Errors computed against a p and л refined solution 


mes p u k ie [m 
s 245 СӨ) 
ке 9 [зз «ds. 
D SEITE] 
ке f=» и [60 60 [68 


The errors are maximum errors at the final time t = 4. The notation d(— p) means d - 107? 


interacts with the weak and slightly non-symmetric potential 
V(x, y) = 1— e ©4095”, 


allowing the solution to reach the boundary where we impose homogenous Dirichlet 
conditions. 

We evolve the solution until time t = 4 using p = 8 and 10 and 2, 4, 8, 16 and 
32 leaf boxes in each direction of a domain of size 12 x 12. The errors computed 
against a reference solution with p — 12 and with 32 leaf boxes can be found in 
Table 3. 

In Fig. 5 we display snapshots of the magnitude of the solution at the initial time 
t = 0, the intermediate times t А 1.07, t ~ 1.68 and at the final time t = 4.0. 


4 Burgers’ Equation in Two Dimensions 


As а first step towards a full blown flow solver we solve Burgers' equation in two 
dimensions using the additive Runge-Kutta methods described in the first part of 
this paper. Precisely, we solve the system 

u;+u-Vu=eAu, хє [-л, я], t > 0, (5) 
where u = [u(x, у, t), v(x, y, ПГ is the vector containing the velocities in ће x 
and y directions. 


The first problem we solve uses the initial condition u — 5[— y, x]? exp(—3r?) 
and the boundary conditions are taken to be no-slip boundary conditions on all sides. 
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Fig. 5 Snapshots of the magnitude of the solution at the initial time (a) t = 0, the intermediate 
times (b) t © 1.07, (c) t © 1.68 and at the final time (d) t = 4.0 


We solve the problem using 24 x 24 leafs, p = 24, = = 0.005, and the fifth order 
ARK method found in [6]. We use a time step of k — 1/80 and solve until time 
tmax = 5. The low viscosity combined with the initial condition produces a rotating 
flow resembling a vortex that steepens up over time. 

In Fig. 6 we can see the velocities at times t = 0.5 and t = 1. The fluid rotates 
and expands out and eventually forms a shock like transition. This creates a sharp 
flow region with large gradients resulting in a flow that may be difficult to resolve 
with a low order accurate method. These sharp gradients can be seen in the two 
vorticity plots in Fig. 6 along with the speed and vorticity plots in Fig. 7. 

In our second experiment we consider a cross stream of orthogonal flows. We 
use an initial condition of 


8 8 
u= [8ye È) | cg 78) qr. (6) 


, 


and time independent boundary conditions that are compatible with the initial data. 
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This initial horizontal velocity drops to zero quickly as we approach |y| — 0.5. 
For |y| < 0.5 the exponential term approaches exp(0) and the velocity behaves like 
и = 8y. The flow has changed slightly by t = 0.06, but we can see in Fig. 7 the 
flow is moving to the right for y > 0 and the flow is moving the left for y < 0 
and all significant behavior is in |y| « 0.5. A plot of the velocity v would show 
similar behavior. We also use 24 x 24 leafs, p = 24, € = 0.025, К = 1/200, and 
пах = 0.75. We show plots of the horizontal velocity и and the dilatation at time 
t = 0.06 and t = 0.15. We only show plots before time т = 0.15 when the fluid 
is hardest to resolve and we observe that after г = 0.15 the cross streams begin to 
dissipate. This problem contains sharp interfaces inside x € [—0.5, 0.5]2. 


5 Conclusion 


In this two part series we have demonstrated that the spectrally accurate Hierarchial 
Poincaré-Steklov solver can be easily extended to handle time dependent PDE 
problems with a parabolic principal part by using ESDIRK methods. We have 
outlined the advantages of the two possible ways to formulate implicit Runge-Kutta 
methods within the HPS scheme and demonstrated the capabilities on both linear 
and non-linear examples. 

There are many avenues for future work, for example: 


* Extension of the solvers to compressible and incompressible flows. 

* Application of the current solvers to inverse and optimal design problems, 
in particular for problems where changes in parameters do not require new 
factorizations. 
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for Interface Problems: Theory and gii 
Implementations 


Yuanming Xiao, Fangman Zhai, Linbo Zhang, and Weiying Zheng 


1 Introduction 


The interface problems which involve partial differential equations having dis- 
continuous coefficients across certain interfaces are often encountered in fluid 
dynamics, electromagnetics and materials science. Because of the low global 
regularity and the irregular geometry of the interface, the standard numerical 
methods which are efficient for smooth solutions usually lead to loss in accuracy 
across the interface. 

For arbitrarily shaped interface Г, it is known that optimal or nearly optimal 
convergence rate can be recovered if body-fitted finite element meshes are used, 
see e.g. [6, 8, 20, 29]. Here, by “body-fitted meshes” we mean an element of 
the underlying mesh is required to intersect with the interface only through its 
boundaries (Fig. 1). Unfortunately, when the geometry is complex, this usually leads 
to a nontrivial interface meshing problem. Therefore, numerous modified finite 
difference methods based only on simple Cartesian grids have been proposed in the 
literature. We refer to the immersed boundary method [24], the immersed interface 
method [17, 18], the ghost fluid method [21], and the references therein. In the 
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Fig. 1 A body-fitted, shape 
regular mesh 


finite element setting, we refer to the work of the immersed finite element method 
[7, 11, 19], the multiscale finite element method [9], the penalty finite element 
method [1]. 

In the past decade, a combination of the extended finite element method (XFEM) 
with the Nitsche scheme has become a popular discretization method. As the first 
attempt, an unfitted finite element method was proposed in [13] which can be 
viewed as a linear and consistent modification of [1]. This approach has motivated 
a number of works, e.g., the unfitted finite element method [4, 5, 12], the Ghost 
penalty method [2, 3], the unfitted discontinuous Galerkin methods [22]. Although 
significant progresses in the error analyses of some methods have been made, the 
development of high-order accurate unfitted FEMs with rigorous error analysis is 
still challenging. We refer to the work of [14—16, 22, 27, 28] which claim high 
order approximations. In [22], an hp-unfitted discontinuous Galerkin method for 
Problem (1) was considered, and optimal h-convergence for arbitrary p was shown 
for the two-dimensional case in the energy norm and in the L?-norm. With an 
extra flux penalty term applied on the interface, [27] gave better hp a priori error 
estimates in both two and three dimensions. In [15, 16], an isoparametric finite 
element method with a high order geometrical approximation of level set domains 
was presented. The analysis reveals optimal order error bounds with respect to л for 
the geometry approximation and for the finite element approximation. In [14, 28], 
various issues related to unfitted methods was addressed, including the dependence 
of error estimates on the diffusion coefficients, the condition number of the discrete 
system, and the choice of stabilization parameters. 

The Nitsche-XFEM can be interpreted as applying interior penalty (IP) methods 
on the interface, and our method falls into this category. The major step in our variant 
15 an appropriate choice of the mesh and geometry dependent weights in the average 
(see (6)), which lead to trace and inverse inequalities for possibly degenerated sub- 
elements (see (9)). We note that in our approach, the penalization is applied only 
to the jump of the solution values across the interface (compared with the bilinear 
form in [27]). The optimal h-convergence rate for arbitrary high-order discretization 
in the energy and L»-norm are proved regardless of the dimension. We refer to [14— 
16] for the similar estimates with respect to Л and [27] for a refined version with 
respect to both Л and p. 
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Efficient implementations of this method are then discussed in two aspects. We 
first consider an optimal multigrid solver for the generated linear system. We use the 
continuous FE space as a “background” subspace, with some smoothing operations 
added near the interface, to formulate a nested geometrical multigrid method. We 
prove the optimality of this special multiplicative multigrid method, which means 
the method converges uniformly with respect to the mesh size, and is independent 
of the location of the interface relative to the meshes. Since the assembling of the 
stiffness matrix will require integration over curved surfaces and volumes, we then 
implement a robust and arbitrarily high order numerical quadrature algorithm by 
transforming surface and volume integrals into multiple 1-D integrals. The code for 
the algorithm is freely available in the open source finite element toolbox Parallel 
Hierarchical Grid (PHG) [26]. We also refer to [23, 25] for different approaches to 
compute integrals on curved sub-elements and their curved boundaries. 

The layout of this paper is as follows. In Sect.2 we introduce the XFE spaces 
and reformulate the interface problem (1) in DG schemes. The H 1- and 17- error 
estimates of both schemes—which attain the optimal order of the convergence rate 
in respect to mesh size h—are given. In Sect. 3, we give an optimal multigrid 
method for the aforementioned DG-XFE schemes. Numerical examples for both 
two and three dimensions are reported in Sect. 4, to illustrate the high accuracy of 
the algorithm. 


2 XFE and DG Schemes for Interface Problems 


We consider the following elliptic interface problem for и: Let Q = Q1 UT U Q2 
be a bounded and convex polygonal or polyhedral domain in Rf, d = 2 or 3, where 
Q and 92 are two subdomains of €? and are separated by a C?-smooth interface Г 
(see Fig. 2 for an illustration of a unit square that contains a circle as an interface), 


—V.(a(x)Vu)— f, in Qi UC», 
[o (x) Vu] = gw, оп Г, 
[и] = gp, оп Г, 
и = 0, ond. 


(1) 


Here a(x) = oj, i = 1,2, is a piecewise constant function on the partition 521 U Q2. 

Denote by {Tp}, a family of conforming, quasi-uniform, and regular partitions 
of Q into triangles and parallelograms/tetrahedrons and parallelepipeds. As K is of 
regular shape, there is a constant yo such that 


h} < К], VK €Th. (2) 


We define the set of all elements intersected by Г as TE = {К €7,:|KnT|z 0}. 
Each T induces a partition of interface Г, which we denote Бу &Г = {ек : ек = 
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Fig. 2 Domain 
Q = 91 UT U Q» with an 
unfitted mesh 


КПГ, К € TO For any K € T}, let К; = K П Qj denote the part of K in Q; 
and п; be the unit outward normal vector on д К; with i = 1,2. Аз Г is of class C?, 
it is easy to prove that (cf.[6, 31]) each interface segment/patch ex is contained in a 
strip of width à and satisfies 


ô< ийк and [n(x) — п, (у) < hk, Vx. y € ex. (3) 
We define the weighted average {-} and the jump [.] on e € él by 


{v} = кил + коо, [v] = vin, + v2n2, (4) 


{9} = kiq1 +242, [9] = 41: n1 92 п. (5) 


For the stability analysis of our schemes, we define (кі, ко) on each element as 
follows: 


m=) L if >1- сойк, (6) 


TET: otherwise . 


Clearly, 0 < к; < 1 and к! + кә = 1 so that {-} is a convex combination along Г. 
Roughly speaking, we adopt the weight x; — ae suggested in [13] for general sub- 


elements and we set к; = 0 for |K;| < one; Here, the user-defined constant co > 
2yoy1 and yo, yı are constants defined in (2) and (3), respectively. The dependence 
of co on these generic constants is elaborated in Lemma 1. 

Let x; be the characteristic function оп Q; with i = 1, 2. Given a mesh 77, let 
Vj, be the continuous piecewise polynomial function space of degree р > 1 on the 
mesh. Let ve = Vp Hy (Q2), ^ = У - X1 and V? = vi - X2. We define the 
XFE space as vr = V + ур. 

Then, the DG-XFE method for the interface problem is: Find up € ur such that 


Bn (un, Vn) = Fh (vh), Yun € УГ, (7) 
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where 


Bi (w, v) =/ a(x)Vw-Vv— (О - [v] 
91092 Г 


Пв 
- ; Vv} + UMS - [v]. 
ZI {о (x) Vv} 2 | p [гы [v] 


Ең (v) =l ros] gN(k1v2 + к211) 
Q r 


ng 
= р V c. - [v], 
в [ e» {a (x) у 2 е n [v] 
7, 


For ng sufficiently large, the norm corresponding to the bilinear form Вр (., -) is 
uniformly equivalent to || · || p,, which is defined by 


1015, = lvli iuo, У) лвл И у + Do ng вк На, (8) 
КЕ KETË 


The crucial component in regard to establishing this equivalence result and also 
the stability of bilinear forms is the control on the weighted normal derivatives, 
which is stated as a trace and inverse inequality in Lemma 1. 


Lemma 1 ([27, 28]) Let yo and yı be constants defined in (2) and (3), respectively. 
If we choose со > 2уоу in the definition (6) of x, there exists a positive constant ho 
such that for all h € (0, ho] and any interface segment/patch ex = КПГЕ El, 
the following estimates hold on both sub-elements of К: 


1/2 C ; 
l|; Vile) = = Иск, vi € Pp(Ki), і = 1,2. (9) 
hg 
The coercivity and boundedness of Bj; (:, -) in its norm || - 15, 15 then a direct 


consequence of the Cauchy-Schwarz inequality. 


Lemma 2 Let V = H?(Q, U 92) and V (h) = УГ + V, we have 
В.(ш, v) < СЫ шв, llulla,, Уш, о є VQ), (10) 
and 
Bh (v, v) > С.1015,. Vue Vi. (11) 


provided the penalty parameter ng is chosen sufficiently large. 
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The XFE space has optimal approximation quality for piecewise smooth func- 
tions in H?” (Q1 U Q5). The following theorem is proved in [28] as an analogue of 
Cea's lemma. 


Theorem 1 Assume that the interface Г is C? smooth and that the solution of the 
elliptic interface problem (1) satisfies u € H* (Q1 U Q2), where s > 2 is an integer. 
Let и = min(p + 1,5). The following error estimates hold for any h € (0, ho]: If 
ng is chosen sufficiently large (see (11)) and иһ is the solution to the first scheme 
of (7), then 


lu — unlla, S А иное), | VO « < ho. (12) 


The hidden constants in the above estimates are dependent on the angle condition of 
the mesh 7 һ, the degree of the polynomials, the parameter in the scheme, and a(x), 
but are independent of the location of the interface relative to the mesh. Here, the 
constant ho is from Lemma 1. 


3 An Optimal Multigrid Method for (7) 


In this section, we propose a two-level geometric multigrid solver of the finite 
element problem (7). It is well known that the element K with a “small” cut (i.e. 
IK n Qj|/|K| < 1) would have adverse effect on the conditioning of the resulting 
stiffness matrices (see e.g. [3]). Our approach is based on the general theory of the 
successive subspace correction (SSC) method of solving on a linear vector space 
y = Eo У; with inner product (-,-) the equation (Au, v) = (f, 0), where 
A: V — V isa symmetric positive definite operator. 

We apply SSC for a relatively couple case of two subspaces (i.e. J = 2), that is, 
У = Vi = = УФУ», with Vi = vp and V2 = Ve , where ve Е уг 15 the space 
of nodal basis functions that vanish on № : = {ху : [supp(;) П Г] = 0}. With a 
slight abuse of notation, ће DG-XFE scheme induces a symmetric posue definite 
operator Вһ for В = 1. Let B, and ВГ be the restrictions of В» on v? and VF, 
respectively. Let Rh: ур — ур be approximately an inverse of Вр. We have this 
two-level successive subspace correction method (Algorithm 1). The similar idea 
has been employed in a special linear case in [32] and analyzed using the framework 
given in [30, 33]. 


Algorithm 1 The multigrid method for (7) 


Implement this iterative procedure until converge: 


1. do subspace correction on vo with an inexact solver Rp; 
2. do subspace correction on уг with an exact solver (Gy. 
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Obviously, Algorithm 1 defines an iterative method for solving Bhun = fh. 
Denote by A, the iterator of the method, then the error contract property is 
summarized as the following theorem. 


Theorem 2 ([28]) Assume that |1 — RnBrllg, < р < 1. Then Algorithm 1 is 
uniformly convergent with respect to the mesh size with 


А 


I — An B; |2 < ———, 
ПАВ, < 


where Л is a constant independent of h. 


When 77 is а shape-regular grid with a geometrical multilevel structure, then 
a geometric multigrid process can be implemented on ур, апа the approximate 


inverse ГА of Bh can be chosen to be the iterator of V-cycle multigrid method. 


4 Numerical Tests 


In this section, we present some initial results to demonstrate the high-order 
accuracy and robustness of our method. A 2-D example was implemented in 
MATLAB. The numerical experiment for a 3-D case was carried out in the open 
source finite element toolbox PHG [26]. 


41 High-Order Numerical Quadratures on “cut” Elements 


Assembling the local stiffness matrix and the corresponding RHS for К Е тї 
requires integration over irregularly shaped manifolds: 


r u(x)dx and 1- | и(х)аг, (13) 
KAQ; ког 


where Г is defined by the zero level set of a piecewise smooth function. 

Our implementation of (13) relies on a general-purpose and arbitrarily high 
order numerical quadrature algorithm proposed in [10]. The basic idea is to choose 
a local coordinate system with three orthogonal directions, decompose integrals 
in (13) into multiple 1-D integrals along these directions, and use 1-D Gaussian 
quadratures to compute these integrals. For 1-D Gaussian quadratures to work, the 
local coordinate system should be suitably chosen according to properties of K and 
Г to prevent essential singularities from appearing in the 1-D integrands, and the 
integration intervals are divided into subintervals at the non essential singularities 
of the integrands. We note that the proposed algorithm only requires finding roots 
of univariate nonlinear functions in given intervals and evaluating the integrand, the 
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level set function, and the gradient of the level set function at given points. It can 
achieve arbitrarily high order by increasing the orders of Gaussian quadratures, and 
does not need extra a priori knowledge about the integrand and the level set function. 

This algorithm has been implemented in the file src/quad-interface.c 
and include/phg/quad-interface.h in PHG [26]. Extensive h— and 
p convergence tests have been performed in [10] and included in a sample code 
test/quad test2.c. 


4.2 2-D Numerical Examples 


Let domain 52 be the unit square (0, 1)? and interface Г be the zero level set of the 
function ф(х) = (x1— 0.5)? + (х2— 0.5)? — 1/7. The subdomain €2, is characterized 
by ф(х) < 0 and Q by ф(х) > 0. The domain Q is partitioned into grids of squares 
with the same size Л. The exact solution is chosen as 


_ J Moni ехр@лх2), (x1, x2) € Qı, 
u(xi, х2) = 1 | | 
Гоо sin(x ху) ѕіп(лхо), (x1, x2) Е Q2. 
The right-hand side can be computed accordingly. 

We implement Algorithm 1, with V-cycle geometric multigrid based on the 
unfitted grid 7; playing as the coarse grid corrector. In each pre- and post-smoothing 
stage of V-cycle iterator, we perform Gauss-Seidel for two times. We record the 
numerical results in Table 1. In these examples, the initial guess is 0, and the 
stopping criterion is 


k 0 i 
lfa — Bru” [о ИП — Bru loo < 107". 


From Table 1, we can see that the multigrid method converges uniformly with 
respect to the mesh size, which confirms our theoretical results. 


Table 1 Numerical performance of Algorithm 1 (2-D example) 


h x 23 2-4 2-5 2—6 

01:05 = 1: 10 |р=1 #iter 7 10 10 11 12 
p=2 #iter 13 10 12 13 14 

0j :02 = 10:1 р= 1 #iter 24 30 29 27 25 


p=2 #iter 24 23 22 21 20 
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4.3 3-D Numerical Examples 


The settings of this numerical experiment are as follows. The domain Q = (0, 1)?. 
The interfaces are two touched spheres of radius 0.1 centered at (0.4, 0.5, 0.5) and 
(0.6, 0.5, 0.5). The exact solution is given by 


ехр(х + x2 + хз), (x1, x2, x3) € 91, 
u(x1, x2, X3) = 


sin(xi) sin(x2) sin(x3), (x1, x2, Хз) € Qo. 


The discontinuous coefficient function is defined such that o, = 1 and a2 = 100. 
A convergence study is performed on a series of meshes generated by uniform 
refinements of an initial mesh consisting of 6 congruent tetrahedra. Relative errors 
and convergence rates of numerical solutions for P, elements for p = 1,2, 3 and 4 
are listed in Table 2, with the quadrature order а = 2p 4- 3. The convergence rates 
are optimal for both H ! (Q)-errors (order p) and L?(Q)-errors (order p +1). For the 


Table 2 Errors and convergence orders of the numerical solutions (3-D example) 


Number of Degrees of Relative L? error 
elements freedom Error Order 


P; element(p = 1,4 =2p+3=5) 


768 189 1.690e—01 1.686e—02 
6144 1241 7.510e—02 3.403e—03 
49,152 9009 3.514e—02 9.618e—04 
393,216 68,705 1.658e—02 2.272e—04 


3,145,728 536,769 8.145e—03 4.869е—05 
P» @етепКр = 2, 4 = 2p 4-3 = 7) 
768 1241 9.041e—03 4.150e—04 
6144 9009 2.026e—03 4.323e—05 
49,152 68,705 4.973e—04 5.171e—06 
393,216 536,769 1.234e—04 6.413e—07 
3,145,728 4,243,841 3.070e—05 7.965е—08 
Рз element (р = 3,4 =2p + 3 = 9) 
768 3925 8.394е—05 
6144 29,449 5.864е—07 
49,152 228,241 3.683e—08 
393,216 1,797,409 2.321e—09 
3,145,728 14,266,945 1.456e—10 


Р; element (p = 4, 4 =2p+3=11) 
768 9009 2.971е—03 9.606е—05 

6144 68,705 6.042е—07 7.560е—09 
49,152 536,769 3.778e—08 2.380е—10 
393,216 4,243,841 2.362е—09 7.48 1е—12 
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time being, however, the design of multigrid solver for 3-D case is still on-going. 
The computations for Р, P» and Р; elements were done using the 64-bit double 
precision and the linear systems were solved using MUMPS, but for P4 element, to 
eliminate influences of roundoff errors, the computations were done using the 80- 
bit extended double precision and the linear systems were solved using the GMRES 
method with MUMPS in double precision as its preconditioner. The performance of 
Algorithm 1 will be reported in a future work. 
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Stabilised Hybrid Discontinuous (8) 
Galerkin Methods for the Stokes Problem was 
with Non-standard Boundary Conditions 


Gabriel К. Barrenechea, Michał Bosy, and Victorita Dolean 


1 Introduction 


The interest of this paper is to discretise the Stokes problem with non-standard 
boundary conditions. In [1], a hybrid discontinuous Galerkin (hdG) method was 
proposed and analysed for this problem. The finite element method used was 
the combination of BDM elements of order К for the velocity, and discontinuous 
elements of order k — 1 for the pressure. In this paper we increase the order of 
the pressure space to К, while keeping the order for the velocity space fixed as К. 
Since this pair does not satisfy the inf-sup condition, a stabilisation term needs to be 
added. 

The stabilisation term referred to above can be built using a diversity of 
approaches, but, roughly speaking, the stabilisation can be residual or non-residual. 
In [8] the authors added a mesh-dependent term penalising the gradient of the pres- 
sure to the formulation. Later, in [14] this method was restricted and reinterpreted 
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as a Petrov-Galerkin scheme leading to the first consistent stabilised method, and 
further developments were presented in the works [7] and [13]. For a review of 
different residual stabilised finite element methods for the Stokes problem, see the 
review paper [2]. 

Now, due to their nature, residual methods include unphysical couplings to the 
formulation, and modify all the entries of the stiffness matrix. Hence, non-residual 
methods where only a positive semi-definite term penalising the pressure is added 
have also being proposed. Examples of this type of methods are the pressure 
gradient projection [9] and local pressure gradient stabilisation [3]. The methods 
just mentioned typically use two nested meshes in order to build the method. Thus, 
to avoid this complication, the local pressure gradient stabilisation has been also 
presented on the same mesh in [12]. Additionally, methods that use fluctuations of 
the pressure gradient are not effective when the finite element space for pressure 
15 the piecewise constant space. The usual way to overcome this is to add pressure 
jumps to the formulation, as it has been done, e.g., in [16]. These have been shown 
to be very effective, but they do somehow temper with the data structure of the code. 
To avoid this, the authors in [10] present an approach that is based on polynomial- 
pressure-projection. This method works for low order of polynomials as was shown 
in [4], and preserves symmetry of the original equation. 

In the light of the discussion of the previous paragraphs, in this work we propose 
a stabilised hdG method for the Stokes problem with non-standard boundary condi- 
tions. The method is reminiscent of the Dorhmann-Bochev method (from [10]), but 
uses the same velocity space used in the hdG method from [1]. 


1.1 Notations and Model Problem 


Let Q be an open polygonal domain іп R? with Lipschitz boundary Г := д9. 
We use boldface font for tensor or vector variables e.g. u is a velocity vector field. 
The scalar variables will be italic e.g. p denotes pressure scalar value. We define 
the stress tensor о :— рУи — pI (where v > 0 is the fluid viscosity and Г is 
the identity matrix) and the flux as о, :— o n. In addition, we denote normal and 
tangential components as follows и, := U · n, и, := U - t, Onn := On -N, Where п is 
the outward unit normal vector to the boundary Г and f is a vector tangential to Г 
such that n - t = 0. 
For D C Q, we use the standard L^(D) space with the following norm 


ИЯ = Pr dx for all f € L2(D). 
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Let us define, for m € N, the following Sobolev spaces 
H"(D):— D € L?(D) : Y |æ] < т 8*v e Lo 
H (div, D) :— fo Е [LD]: V-ve Lo 


Й ex 
where, for = (04,02) Е №, |a| = a, + оо, and 0% = тутт In addition, we 
X, 0X) 


will use the standard semi-norm and norm for the Sobolev space H” (D) 
m 
нору = У, 191, Ilio := У Lp V f € Н"). 
|o. | 2m k=0 


In this work, we consider the two dimensional Stokes problem with tangential- 
velocity and normal-flux (TVNF) boundary conditions 


—vAu + Ур =f inQ, 


V-u=0 inQ, 

(1) 
Onn =g onl, 
Ut =0 or, 


where u : О — R? is the unknown velocity field, p : О — R the pressure, v > 0 
the viscosity, which is considered to be constant, and f € [LQ], gE L? (Г) аге 
given functions. The restriction to homogeneous Dirichlet conditions on и; is made 
only to simplify the presentation. 

Let {Th } һо бе а regular family of triangulations of Q made of triangles. For 
each triangulation 7; , En denotes the set of its edges. In addition, for each of element 
К є Ty, hk := diam(K), and we denote Л :— тахкет, hk. We define following 
Sobolev spaces on the triangulation 7; and the set of all edges in Ep 


iub E : vige L'(E)V Ec 21 
H"(T,) := E e L9): vin € H"(K)VK € Ul form € N, 
with the corresponding broken norms. 
Now we will introduce the finite element spaces that discretise the above spaces. 
Let k > 1. We start by introducing the velocity and pressure spaces. To discretise 


the velocity u we use the Brezzi-Douglas-Marini space (see [5, Section 2.3.1]) of 
order k 7 1 defined by 


BDMj := [vn € H (div, 9) : vale € [P GO]. YK € Tih . 
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Associated to this space, we introduce the BDM projection ПЁ: [Н (о? — 
BDM k defined in [5, Section 2.5]. The pressure is discretised using the following 
space 


0} = [an € L° 0: дык € Pk (K) Y K em]. 


Associated to this space we define the local L? (K )-projection yk : [2(К) > 
P. (К) for each К є Tp defined as follows. For every w € L? (К), yk (w) is 
the unique element of Px (K) satisfying fg WE (w)undx = fgwvrdx V up € 
Рк (К), and we define the continuous projection Wie = yk forall K Е Th. 

The last ingredient needed in the method described below is a finite element 
space associated to a family of Lagrange multipliers associated to the edges of the 
triangulation. These multipliers will be denoted by и and are meant to approximate 
the tangential trace of the velocity u on the edges of the triangulation. For this, and 
in order to propose a discretisation with fewer degrees of freedom, we discretise the 
Lagrange multiplier using the space 


м! := E € L? (En): бе € Рь 1 (E) VE € En, б = Ооп P 


Furthermore, we introduce for all E € & the L?(E )-projection pt : L? (Е) > 
Рр 1 (E) defined as follows. For every i» є L? (E), ФЕ! (Ù) is the unique element 
of Ре (E) satisfying fe PÉI) Ùn ds = Se шо, ds V Uy € Py (Е), and we 
denote Ф! : L? (En) > Mi defined as ФК ЦЕ = pk! for all E € Ep. 


2 TheStabilised Method 


Our approach is to write the discrete problem with the same degree of polynomials 


for velocity and pressure spaces. In other words, denoting Vj, := BDM k x M TE 


we want to use the space Vp x Dr. instead of Vp x Qi! as it was done in [1]. To 
do this, we need the proper stabilisation term, because this choice of spaces does 
not guarantee inf-sup stability. 

The first ingredient in the definition of the stabilised method for (1) we use the 
same bilinear forms as in [1], this is 


а ((ш», Wn). (vn. in)) = У. (f, v Vu : Vu, dx 


КЕТЬ» 


— [ v (дн wp), ( (vn), — tn) ds + ef v( (wa); = Wn) (дить), ds 
aK aK 
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+02 I d 1 ( (wr); — Ф) ^ ( (о), — Tn) ds) 
К JaK 


b ((vn. tn) 4») = — у, | sv» ах, 


КЕТЬ» 


where = € {—1, 1} апат > O isa stabilisation parameter. In addition, to compensate 
for the non-inf-sup stability of the finite element spaces we have chosen, we 
introduce the bilinear form 


1 - = 
5 (pn, qn) = Т Í (pr — y% ! pu) (an – wt !qu) dx. 


With these ingredients we can now present the finite element method analysed in 
this work: Find (иһ, Ир, Ph) Е Ух mp such that for all (vn, Un, qn) Е Vh х p 


A ((иһ, tin, рь), (vn, n аһ)) = | foi dx + [ g (vn), ds, (2) 
Q Г 
where 
A (ил, üh, Ph) , (ол, Ùh, а)) =a ((мь, iin) | (vn, in)) +b (о, бл) , pi) 


+b (ил, ün) а) — s (pn. qa) - 


2.1 Well-Posedness of the Discrete Problem 


Let us consider the following norm on Vp (see [1, Lemma 3.2] for a proof that this 
is actually a norm in Vp) 


2 
IIl (wa, tn) IP =v У) (Ca +h [dnw [к + то ПИСЕ; 
KeTh 


The first step towards proving the stability of Method (2) is the following weak 
inf-sup condition for b. 


Lemma 1 There exist constants C1, Co > 0, independent of hx and v, such that 
b ( (вв, 5») а) 


sp ———— ^ > Cala — С |a — va. 


= Уаһ Е ОЁ. 
(vp,5,)eV, ИИ (Vhs бл) ||| 


(3) 
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Proof We consider an arbitrary qn Е 0. Let © be a convex, open, Lipschitz set 
such that Q C ©, and let us consider following extension 


x qn in Q 
dh := .. & . 
0 in QV Q9 


Let now $ be the unique weak solution of the problem 


Аф = Gn in Q 
ф=0 олдо ` 


Since Q is convex, then фЕ H*(Q). Then w := Уфо belongs to [H'(Q)]?, and 
for := wr, 

b ((w, ti) an) = lanle Van € О}. (4) 
In addition, applying standard regularity results, see [5, Section 1.2], we get 


lwla < УФ) < са. (5) 


2 
In [1, Lemma 3.5] it is shown that there exists a Fortin operator II : Li (| — 


Уһ satisfying the following condition: for all v € [H ! (Q)]? the following holds 
b ((,5) qn) =b(M@).4n) Vane О", (6) 
ШП Co) II] < СМ ние. (7) 


Let (wn, Wn) := П (ш), then thanks to (6), (4) and the continuity of b (see [1, 
Lemma 3.3]) 


b (Cwn. in) аһ) = b (us) gr) =b ((w — m. i — in) ai = Vai) 


2 
> Пана = с | У Ion = wf | 
КЕТЬ 


Using the approximation properties of the BDM interpolation operator (see [5, 
Preposition 2.5.1]) and (5) 


qn — yg, l. . 


- 1 Е 
b (о. wn) а) > (lala — C203 |an — y lg, |.) | W| н\(о) 


> (с lanlo = C2 |an — W'ar 


;) (tn) 
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where, in the last estimate we have used the stability of the Fortin operator II in the 


Ш. ||| norm (7). This proves the result with Cy = cm and C? = CA о 


Before showing an inf-sup condition, we prove the continuity of bilinear form А. 


Lemma 2 There exists a constant C > 0 such that, for all (wn, Wn) У (vp, in) € Vg 
and ry, qu Е ОГ, we have 


< СИ (шл, Wn, rn) Ши (vns 9n. qn) Ila- 


(8) 


А ((wn. Ün, rn) + (vn. On, аһ)) 


Proof We use the continuity of the bilinear forms (see [1, Lemma 3.3]) and the fact 
that the projection is a bounded operator. o 


The final step towards stability is proving the inf-sup condition for bilinear form A. 


Lemma 3 There exists В > 0 independent of hx such that for all (wn, Wh, rn) Е 
Vn x Оў the following holds 


A (о. Wh, rn) ; (vn, Uh, а)) 
sup -———————— 1 e 


- > В||| (wa. би, rn) м. — (9) 
(va.S. qu) Vax ОЁ [|| (va, би, ал) 11А 


As a consequence, Problem (2) is well-posed. 


Proof Let (шл, Wh, rn) Е Vy x Qt. The idea of the proof is to construct an 
appropriate (vn, Uh, qn) such that 


A (шп, ion, ra) (ов, ns qn) ) = el (шк, бп, гъ) Ma II Cons Sns qo) 11. 


To achieve that we use coercivity of a (see [1, Lemma 3.4]), continuity of a (see [1, 
Lemma 3.3]) and Lemma 2. For details see [6]. О 


2.2 Error Analysis 


In this section we present the error estimates for the method. The addition of the 
stabilising bilinear form 5(., -) introduced a consistency error. However according 
to [4], this should not be viewed as a serious flaw, as this consistency error can be 
bounded in an optimal way. The following result is the first step towards that goal. 


2 
Lemma 4 Let (u, р) e [Н'(®) N H? (7) | x 12 (9) be the solution of the 
problem (1) and и = и; on all edges of En. If (иһ, Uh, Ph) Е Vh x ОК solves (2), 
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then for all (vn, Ùh, qn) Е Ух ok the following holds 


A ((u — un, à — ün, p — pi) (vn, ns an) = s (р.а). (10) 


Next, we introduce the following norm 


1 
Пб, м, РУ = 11и, will + lele (11) 


and prove the following variant of Cea's lemma [11, Lemma 2.28] for this stabilised 
Stokes problem. 

2 
Lemma 5 Let (u, p) € [н! (Q)n H? () | x 12 (9) be the solution of the 


problem (1) and u = и; on all edges of En. If (иһ, 1% Ph) € Vh x ok solves (2), 
then there exists C > 0, independent of h and v, such that 


Ill (u — un, ñ — iin, p — pn) llla <C inf Ill (м — vr, & — 9, p — qn) № 
(ин, би. Ч) Vn x Оў 
C k-1 

+ = |р 1р. 12 

vs IP Pl, (12) 

Proof It is а combination of Lemmas 1, 2 and 3. For details see [6]. О 


2 
Lemma 6 Let (u, p) € Li (DN н? (л) x L? (Q) be the solution of the 


problem (1) and u = и; on all edges of En. If (иһ, Ир, Ph) Е Ух [en solves (2), 
then there exists C > 0, independent of h and v, such that 


"E 1 
[| (и — ми, й — ŭn, p — pn) Illa < Сп (Vium Torte). 


Proof It is a combination of [1, Lemmas 3.8] and Lemma 5 with the local [°= 
projection approximation [11, Theorem 1.103]. o 


3 Numerical Experiments 


The computational domain is the unit square Q = (0, 1)?. We present the results 
for k — 1, that is the discrete space is given by BDM} x Мо х 01. We test 
both the symmetric method (e — —1) and the non-symmetric method (e — 1). We 
have followed the recommendation given in [15, Section 2.5.2] and taken т = 6. 
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We choose the right hand side f and the boundary condition g such that the exact 
solution is given by 


и = curl K — cos((1 — x) sin(x?) sin(y?) (1 — cos((1 — »)] ‚ p = tan(xy). 


In Fig. la and b we depict the errors for both the symmetric and non-symmetric 
cases, respectively. We can see that they not only validate the theory from Sect. 2.2, 
but also perform an optimal h? convergence rate for ||u — ир ||. Furthermore, we 
observe an increased order of convergence for || p — рь|о. In fact, the error seems 
to decrease with О (А3/2), rather than the O (h) predicted by the theory. 

To stress the last point made in the previous paragraph, in Table 1 we compare 
the L? error of the pressure (||р — panllo) for hdG method introduced in [1] and 
stabilised hdG method from Sect. 2. Columns p; € О? are associated with hdG 
method and pp € Qi with stabilised hdG ones. There, we confirm that the pressure 


e 5 

o Е _ 

Е L 

Š а ао 5 — |u- wo 

S а —- |||(u — un; & — ča) ||| j e [||(u — un, à — @һ)||| | 

(b) 

Fig. 1 Convergence the stabilised method with k = 1. (a) Symmetric bilinear form (e = —1). 
(b) Non-symmetric bilinear form (e = 1) 

Table 1 Comparison of the error of the pressure ||p — pilo 

Symmetric bilinear form (= = —1) Non-symmetric bilinear form (e = 1) 
һ p.e 0} 


rl 0.159019 0.090624 
23 0.084875 0.047488 
2-3 0.043313 0.009449 
a 0.021513 0.003516 
a 0.010707 0.001269 
2° 0.005346 0.002171 
2—7 0.002672 | 0.000453 
gen 0.001336 — 0.000161 
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error for the stabilised version is much smaller than the one for the inf-sup stable 
case, in addition to having an increased order of convergence. 


4 Conclusion 


In this work we have applied the idea introduced in [10] to stabilise the hdG 
method proposed in [1] for the Stokes problem with TVNF boundary conditions. 
The method adds a simple, symmetric, term to the formulation, and allowed us to 
use a higher order pressure space, which, in turn, improved the pressure convergence 
(although a proof of this fact is, in general, not available). This approach was 
also applied to NVTF boundary conditions (see [6]) and can be used for other 
discontinuous Galerkin methods that deal with Stokes or nearly incompressible 
elasticity problems. 

Future testing using higher order discretisations is needed to assess whether this 
approach provides an increase of the convergence rate for the pressure. Thus, the 
numerical tests with higher order of polynomials for discontinuous finite methods 
is interest for further research to look for the improvement of the convergence. 
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RBF Based CWENO Method A 


Check for 
updates 


Jan S. Hesthaven, Fabian Mönkeberg, and Sara Zaninelli 


1 Introduction 


A broad range of physical phenomena can be described by hyperbolic conservation 
laws of the form 


и, + f(u)x = 0, (x,t) e R x R+, 
u(0) = uo, 


(1) 


with the conserved variables u : R x Ку — В“ and the flux function f : RN > 
R^. The nonlinear behavior of f can lead to complex solutions, most notably 
shocks. It is well-known that high-order methods give good results for smooth data, 
but for discontinuous ones spurious oscillations are introduced. A popular class of 
methods to solve (1) is the finite volume method, which is based on a discretization 
in space ... < xi-1/2 < Xi41/2 < ... and the average values и; of its cells 
C; = [xi-1/2. Xi+1/2]. It is defined by the semi-discrete scheme 


du; E | Fia - Fi Q) 
dt Ах 
where the numerical flux term F;+1/2 depends on the values {uj-x,...,Ui+p—xk} 


with 0 < k < p — 1. For more details we refer the reader to [15, 20, 22]. 

The class of essentially nonoscillatory (ENO) methods, introduced by Harten et 
al. [14], reduces spurious oscillations to a minimum. They are based on a monotone 
numerical flux function F(u, v) and high-order accurate reconstruction s;(x) for 
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each cell i. The central idea is to choose the least oscillating interpolation function 
s; and define the numerical flux F;+1/2 = FUR у, ир) with iip being 
the evaluation of 5:1 and s; at the interface х;+1/2. Based on ће ENO method, 
Jiang and Shu [19] introduced the weighted ENO (WENO) method which considers 
different interpolation polynomials, based on different stencils, and combines them 
in a nonoscillatory manner to maximize the attainable accuracy. Further results on 


ENO and WENO methods can be found in [10, 11, 16]. 


2 CWENO 


The CWENO method is based on the WENO method and was introduced by Levy et 
al. [23] as a third order method. Further analysis and generalization to higher orders 
on general grids can be found in [6, 7]. 

Let us consider the standard semi-discrete formulation (2) with a monotone 
flux function F(u, v). The goal is to construct a reconstruction Р; for each 
cell C; based on the stencil (Cj д, ..., С; к} for Е N. In the smooth regions 
the algorithm should choose a polynomial of degree 2k which interpolates the 
central stencil uj &,..., ик in the mean value sense. In case of a non-smooth 
solution it chooses a polynomial of degree k on one stencil {Cj_x4), ..., Сы} that 
avoids the discontinuity. Given the reconstruction, the high-order numerical flux is 
Е 41/2 = F(Prec,i+1(xi+1/2), Prec,i (Xi+1/2)). 

Specifically, let us consider Py»; as the polynomial of degree 2k that interpolates 
all data in the 2k + 1 stencil and the polynomials P; of degree k that interpolate the 
data on the stencil (C; 4.4.1, ..., Ci4i-1} for? = 1,..., К + 1. Furthermore, the 
reconstruction depends on the choice of the positive real coefficients do, ..., ак € 
[0, 1] such that bx dj = 1,do zz 0. Then, the reconstruction polynomial of degree 
2k is 


k+1 
Prec(x) = Уор), (3) 
1=0 
with 
1 k+1 
№0) = g (Pora) - у арх), (4) 


1=1 
and the nonlinear coefficients cy that are defined аз 


= a (5) 
a= — о, ИТ. 
Уа Р] ey 


|= 


RBF Based CWENO Method 193 


where ПР] indicates the smoothness of Р, 1 > є > Oandt > 2. А 
classical indicator of smoothness in the cell C for a polynomial is the Jiang-Shu 
indicator [19] 


1 
ПР] = у атс)! | (ZPO) x. (6) 


1-0 C 


The choice of € is of importance: if it is too small, it might affect the order of 
convergence. On the other hand if it is too big, spurious oscillations may occur. 
Cravero et al. [7] show that the choice Е = ЄЛ? for р = 1,2 leads to the maximal 
order of convergence. As proposed in [7] we define the coefficients 4; over the 
temporary weights 


"E | > k+2 


and we choose dg Е (0, 1) for the high-order polynomial. This gives us a possible 
choice for the coefficients 


^ 


d; 
dj = ———-( ~ do). (8) 


i201 


The main difference with respect to the classical WENO method is that for the 
smooth case we are not constructing Pop; out of the polynomials Ру, but we build it 
independently by resolving an additional system of equations. This method has the 
advantage that it is easier to generalize on general grids in high dimensions, while 
maintaining high-order accuracy. 


3 Radial Basis Functions 


An alternative to the classical polynomial interpolation is the interpolation with 
radial basis functions (RBF). RBFs were proposed in the seminal work by Hardy 
[13]. They have been successfully applied in scattered data interpolation [4, 9, 17, 
24, 277] and as a basis for a generalized finite difference method (RBF-FD) [5, 12]. 
The advantage is its flexibility in high dimensions and the possibility to reduce the 
risk of ill-conditioned point constellations. Its disadvantage is the ill-conditioning 
of the interpolation matrix for small grid sizes [8, 21, 26]. 

The RBF interpolation is based on a basis 2, obtained from a univariate 
continuous function @ : R^ > IR, composed with the Euclidean norm centered 
at the data points 


ф(х — xj) := é(ellx — х), (9) 


194 J. S. Hesthaven et al. 


Table 1 Commonly used RBF (r) Order 
RBFs withN Jv >0,keN 


arid 0 Infinitely smooth RBFs 


Multiquadratics (1+(er)*)” [vi 
ВТО ОА у= 0 
| ехр(— (er)?) 0 


Gaussians 
Piecewise smooth RBFs 


r-a k 


[20-а log(r) k 


Polyharmonic splines | 


with the shape parameter £. Some common RBFs can be found in Table 1. Thus, 


for given scattered data points X = (x1,...,x4)7 with x j€ IR and corresponding 
values fi,..., fn € R we look for 
n 

s(x) = 3 ajg — xj) + р(х), (10) 


j=l 
with a polynomial p € Hm- 1 (R4), m € N, the interpolation condition s(x;) = fj 
and the additional constraints 
n 
У ae So, for all q € Hm- (R^), (11) 
Ј=1 


with the coefficients a; € К forall j = 1,..., и. 
The same concept can be applied in the case of cell-averages. We seek functions 


n 


s(x) = Уа фб 8) Pp), — pell, (R^, (12) 
j=l 
such that 
Àc;$ =й}, forall j = 1,...,n, (13a) 
n 
У `ајАс(р) =0, for all C € (Ci,..., Cn}, (13b) 


= 


with the averaging operator Ac f (x) = Ii] f. c f (x)dx. A well-known problem with 
RBFs is the high condition number of the interpolation matrix for small grid sizes 
or small shape parameters [8, 21, 26]. This problem can be resolved by using the 
vector-valued rational approximation method [28]. 
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4 RBF-CWENO 


Methods combining RBFs and essentially nonoscillatory methods have been pro- 
posed, e.g. RBFs with ENO [18, 25], RBFs with WENO [1-3]. The advantage of 
the CWENO method over the WENO method is its flexibility on general grids and 
its independence of the construction of a high-order interpolation function out of 
lower order ones. This facilitates the use of the whole grid in smooth regions and is 
important for non-polynomial interpolation functions which cannot be combined to 
an higher order function. 

We propose the RBF-CWENO method which works as the classical CWENO 
method with the reconstruction function (3) and the weights (5), but as interpolation 
function we use RBFs instead of polynomials. Since the problem of the ill- 
conditioning can be solved by using the vector-valued rational approximation 
method [28], the main challenge for RBF methods is the choice of the smoothness 
indicator. For polyharmonic splines, Aboyar et al. [1] use the semi-norm of the 
Beppo-Levi space and Bigoni et al. [3] use a modified version of the Jiang-Shu 
indicator (6). 


4.1 Smoothness Indicator 


The smoothness indicator is the heart of the essentially nonoscillatory methods. We 
consider one based on the one introduced by Bigoni and Hesthaven [3] 


Їз] = Уд | (Z0 y'a. 


+1 gl 2 
2g41 98 W 
+ Ах; f. (= mic OA = D Я 


Ј 


(14) 


where the first part is ће sum of the derivatives of the polynomial part апа the 
second term expresses the highest derivative of the RBF-part. The original Jiang- 
Shu indicator applied to (12) would include the lower derivatives of the RBF-part 
plus all mixed terms, but we find this to be less efficient. For simplicity the integrals 
can be approximated with a simple mid-point rule. 

We face again the problem of ill-conditioning when recovering the coefficients 
aj. Numerical examples indicate that small shape parameter improve the accuracy, 
but they do not affect the choice of the stencil using this smoothness indicator. Thus, 
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we use a bigger shape parameter єк, that is smaller than the smallest distance to а 
singularity 


Ев = 0.95( max |х; — xjl ^. (15) 
i,j<N у 


which ensures the solvability of the system of equations [28]. 


5 Numerical Results 


We now discuss the numerical results of the RBF-CWENO method and compare 
it with the RBF-WENO method [3] and the classical ENO method [14]. АП 
methods are using the Lax-Friedrichs numerical flux and integration in time is done 
using the SSPRK-5 method [15] with time step dt = CFL - Ах/Атах and the 
maximal eigenvalue Ajax of У, F. Furthermore, we use the vector-valued rational 
approximation approach [28] to circumvent ill-conditioning of the interpolation 
matrix and a shape parameter = = 0.1. For the nonlinear weights (5) we choose 
Е = êh? with ê = 0.1. 


5.1 Linear Advection Equation 


Let us consider the linear advection equation 

и + auy = 0, x € [0, 1], (16) 
with wave speed a = 1, initial condition ио(х) = sin(2z: x) and periodic boundary 
conditions [22]. Note that for К = 3 we expect the order of convergence to be 7, 
therefore we use the reduced time step dt = CFL .- Ax 1/5 /Атах to recover the 
right order of convergence. The correct order of convergence of the RBF-CWENO 


method is shown in Table 2 and it seems to be more accurate than the RBF-WENO 
method. 


5.2 Burger’s Equation 
Considering the Burger’s equation 


1 
и; + PIRE =0,  xe[0,1], (17) 
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Table 2 Convergence rates of RBF-CWENO using multiquadratics for the linear advection 
equation at time т = 0.05 


RBF-CWENO 


| 
к Bmw — [Rae 
L[16 [весы |= [поет [uso |-  |L5754e-02 |- 
4.8924e—08 [169 
12608603 |196 
2931-05 [376 
2308—06 | 5.34 
a| 16 [23796-05 |_| 73671606 |- |&IMle-06 |-  |s4401e-04 |- 
2293805 [560 
4THTe-06 369 
2395607 |374 
19221e-98 |376 
з | 16 [38815-05 |- [1.331905 |- |77293е 06 |-  [22578e-04 |- 
7348306 [494 
L40756-07 |571 
[451009 660 
20120611 [617 


We use shape parameter ¢ = 0.1, CFL = 0.01 


1 


0.5 


—.- h=l/16 
= - hzl/32 
—— һ=1/64 
arene h=1/128 
— Ref. sol. 


es |= 


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 


Fig. 1 Burger's equation at т = 0.3 with uo = sin(27x) solved by using RBF-CWENO method 
with MQ interpolants of order k — 3 


we analyze its robustness with respect to discontinuities. In Fig. 1 we report the 
results performed with C FL — 0.5 at t — 0.3. We observe no oscillations around 
the discontinuity at x — 0.5 and as expected an increasing accuracy for increasing 
number of elements. 
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5.3 Euler Equations 


The one-dimensional Euler equations express conservation of mass, momentum and 
the total energy. They can be described by the density р, the mass flow т, the energy 
per unit volume Ё and the pressure p through 


p m 

2 
m| +| +p | =0, (18) 
EJ, UE р) 


x 


with р = pT = (y – 1)(Е— jm) for an ideal gas with the ratio of specific 
heat y = 1.4 [15]. For К = 3 we need to change the nonlinear weights (5) by using 
Е = êh? with € = 107 to avoid oscillations. 


5.3.1  Sod's Shock Tube Problem 


The Sod's shock tube problem describes two colliding gases in [0, 1] with different 
densities given by the initial conditions 


(1,0, 1) ifx < 0.5 


(19) 
(0.125,0,0.1) ifx > 0.5 


(оо, то, ро) = 


This results in a rarefaction wave followed Бу a contact and a shock discontinuity 
which separates the domain into four domains with constant variables. The RBF- 
CWENO method resolves it well, see Fig.2. Ног К = 3, we observe minor 


02 0.4 0.6 0.8 02 0.4 0.6 0.8 


Fig. 2 Results for the Sod shock tube problem at t = 0.2 solved by using RBF-CWENO with MQ 
interpolants of order k — 2, 3 on characteristic variables (left: k — 2, right: k — 3) 
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Fig. 3 Results for the Euler shock entropy problem at t = 1.8 solved by using RBF-CWENO with 
MQ interpolants of order k — 2 on characteristic variables (Left) and a comparison with WENO, 
ENO2 and ENOS for N — 256 cells (Right) 


oscillations, but their amplitude decreases for increasing number of elements. 
Furthermore, we observe the increasing accuracy for k = 3 compared to К = 2. 


5.3.2 Shu-Osher Shock-Entropy Wave Interaction Problem 


The Shu-Osher problem describes the interaction of a discontinuity with a low 
frequency wave which introduces some high frequent waves. Its initial conditions 
are 


(3.857143, 2.629369, 10.33333) if x « —4 


| (20) 
(1 + 0.2 sin(5x), 0, 1) ifx > —4 


(ро, то, po) = 


In Fig. 3, we observe on the left side the increasing accuracy for increasing number 
of elements for k = 2. On the right side we see its good approximative behaviour 
compared to the existing methods ENO2, ЕМО5 and the corresponding WENO. 
In particular we observe that the performance of the RBF-CWENO (k = 2) 15 
comparable to ЕМО5 and superior to WENO (k = 2). 


6 Conclusion 


In this work, we introduce the RBF-CWENO method that relies on the CWENO 
method [23] and the use of radial basis functions for the interpolation. We develop 
a smoothness indicator that is based on RBFs but works similarly to the one for 
polynomials. Furthermore, we tackle the problem about the choice of the weight 
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1 >> 0. Fore = êh? with = 0.1 we get the right order of convergence, but for 
the 7th order method (k = 3) we choose ê = 107 to reduce spurious oscillations 
for the Euler equations. 

Moreover, we should point out that the choice of the linear weight do can 
influence the result; indeed if it is too close to 1 then the reconstruction almost 
coincides with Pop, which can lead to spurious oscillations in case of discontinuous 
solutions. We present multiple numerical examples to show the robustness of the 
method. 

We can conclude that the RBF-CWENO method works comparable to the 
existing RBF-WENO and ENO methods in one dimension. The advantage of 
RBFs is clearer when considering unstructured grids in higher dimensions where 
polynomial reconstruction is complex. 
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Discrete Equivalence of Adjoint (8) 
Neumann-Dirichlet div-grad n 
and grad-div Equations in Curvilinear 

3D Domains 


Yi Zhang, Varun Jain, Artur Palha, and Marc Gerritsma 


1 Introduction 


In Rf, given a bounded domain © with Lipschitz boundary 9Q and Ên Е 
H! (8€) = tr H (div, Q), w € Н\(©) solves the Neumann problem, 


ш = бп on dQ 
дп , (1) 
—div (grad w) + = 0 in Q 
if and only Ко € H (div, 2) which solves the Dirichlet problem, 
o-n=G6n on dQ 
(2) 


—grad (div o) +o = 0 in Q 


satisfies о = grad w [3]. This is obvious at the continuous level. The question is 
whether we can find a set of finite dimensional function spaces such that o^" = 
grad о" holds if о? and ø” solve the discrete Neumann and Dirichlet problems 
respectively. The answer is yes. 
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Throughout this paper, we restrict ourselves to R?. We will first construct the 
primal polynomial spaces and their algebraic dual representations, and then use 
them to discretize problems (1) and (2) such that the identity g^ = grad о! holds 
at the discrete level in any curvilinear domain for any polynomial approximation 
degree. This work extends [7, 9], where similar dual Neumann-Dirichlet problems 
are considered, to 3-dimensional space. These primal spaces and their algebraic 
dual representations can be ideal for the so-called mimetic or structure-preserving 
discretizations [1, 4, 8, 11, 12]. Together with their trace spaces, they can be used 
for the hybrid finite element methods which first decompose the domains into 
discontinuous elements then connect them with Lagrange multipliers living in the 
trace spaces [2, 13, 14]. 

The outline of this paper is as follows: In Sect. 2, we introduce the construction 
of polynomial spaces and their algebraic dual representations. The discrete formu- 
lations of the Neumann-Dirichlet problems and the proof of their equivalence at 
the discrete level follow in Sect.3. A 3-dimensional numerical test case is then 
presented in Sect. 4. Finally, conclusions are drawn in Sect. 5. 


2 Function Spaces 


2.1 Primal Polynomial Spaces 


Let-1—& <& <... < i, = Li = 1,2, 3, being three partitionings of [—1, 1]. 
The associated Lagrange polynomials are 


| I pg | 
h;&)- |] ",ў=0,1,,[. 


i_ gi 
т=0,т=2 5) т 


They are polynomials of degree Г! which satisfy the Kronecker delta property, 
hj (/) = дк. The associated edge functions can be derived as [6], 


j-l 


' dA, (=! ; 
е (Е) =-У` E joies. 
k=0 


which are polynomials of degree // — 1. Edge functions also satisfy the Kronecker 
delta property, but in the integral sense, 


& К 
| ова =з. 


5a 
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Consider a reference domain Qree| ElELES = [-1, ij. With the tensor product, 


. А . à 1 72 73 
we can construct finite dimensional scalar function space P! ›Г›Г spanned by 
polynomial basis functions 


[ni n; тэ), 
and vector-valued function space L£! LPDP spanned by polynomial basis functions 
[e EDA EDME), tie; Gn G9). ИОВ EDE}. 


1 72 73 
Let o є Ф! 1T be 


пр P 


o! = 3 Y wi i hiE)h; EDME’). (3) 


i20 j=0 k=0 


Due to the way of constructing the edge functions, we can easy derive p” = 
1 72 73 
grad œ e L! 1^, 


p" = grad о" = (pi, po, рз)", 


where [6], 

1! 12 13 

pi — УУУ (wi jc — wii ja) еп EDE), 
i-] j=0 k=0 
Г! 12 13 

02 = УУУ (шк wij) hi De; EDE), 
i=0 j=1 k=0 
п 12 13 

з= УУУ (wi ja — wi jk-1) hi Dh; Ее. 
i=0 j=0k=1 


Let о, р be the vectors of expansion coefficients of ай, p”. We can obtain 


p= ©, (4) 


where Е is called the incidence matrix. The incidence matrix is very sparse, only 
consists of -Е1 as non-zero entries. If we squeeze, stretch or distort the domain, of 
course, the polynomial basis functions change, but the incidence matrix will remain 
the same. It only depends on the topology of the mesh and the numbering of the 
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degrees of freedom. And it is exact. In other words, it introduces no extra error. 
АП these features make it an excellent discrete counterpart of the grad operator. 
Examples of incidence matrices can be found in [8, 10-12]. 

For a comprehensive explanation of these polynomial basis functions, we refer 
to [6]. In isogeometric analysis, tensor-product B-splines with similar properties 
have been developed, see, for example [5]. For tetrahedral elements, an analogue 
development can be found in [15]. 

From (3), we can derive the trace of о) ‚ for example, on ће back boundary of 


Or Гь = fel = —1, #2, £ e [—1,1]), 


12 P 


try o^ = 3 У шо,укһо(—1)Л EDME’). 


j=0 k=0 


Let œ, be the vector of expansion coefficients of trp w". Clearly, there exists a linear 
operator Np such that 


оъ = № о. 
The same processes сап be done for other boundaries. If we collect the traces 
of œ” on all boundaries and combine their vectors of expansion coefficients and 


corresponding linear operators, we can eventually obtain 


yr = N o, 


where the matrix N, like IE, is sparse and only depends on the topology of the mesh 
and the numbering of the degrees of freedom. Furthermore, it contains only 1 as 
non-zero entries. An example of N can be found in [7]. Now, we can conclude that 


1 у2 уз IRP. . 
the trace space, P! ^^ = tr P! 1!” is given аз 


1 72 73 2 73 2 73 1 73 1 73 172 1 72 
PS = РГ UP ОР! UPT p gs e. 


where р is the s d by !Ag(—-DA;(£?)ny (E? pp is th 
= pace spanned Бу |по(—П (82) (3) |, PL ^ is the 


space spanned by [an (БА; (Е?) Пк Eò] and so on. Notice that the polynomial 
basis functions in [ho Dh; EDE) are exactly the same as those in 


fin (Dj (&?)һк (E3) | because ho(—1) = hp(1) = 1. But here we still distinguish 
them because they represent basis functions at different boundaries. 
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2.2 Algebraic Dual Polynomial Spaces 


{ 1 72 73 Я ; 
We first consider the space P! >. Let Мр be the symmetric mass matrix, for 
example, 


Mp;, АЗАСЫ) 2+1), Ет) т) 5 


| [ [ hi Eh; (E2)hg Eh Ehm (Ев, (E?) de dg?d£?. 
ref 


The associated algebraic dual polynomial representations, or simply dual polyno- 
mials, are linear combinations of the polynomial basis functions, or simply primal 
polynomials, defined in the previous section, 


[оо@!, 82, 3), inp EE] 
= [PoE not no, + hn Gne Gn | М". 


These dual polynomials are always well-defined. This is because the primal polyno- 
mials are linearly independent. So the mass matrix Мф is injective and surjective, 


therefore invertible. Let the finite dimensional space spanned by [nc (El, E^ E3) | 


~I! 12 13 ~]! 12 13 
be denoted by P > >° .We ѕау Р ' ' is the algebraic dual space of ће primal 


1 72 73 103 ~I! 12 
space Ф! ГГ, Note that P! ^" and P actually represent the same space. 
The change of basis functions only leads to a different representation. Therefore, we 
also call the algebraic dual space a dual representation. Let Мр be the mass matrix 
n pr : 
, we can easily see that 


MpMp = Г, (5) 


where J is the identity matrix. Similarly, we can derive the algebraic dual space 
РЕ 


L 


have 


1 72 73 ~ 
of the primal space £! Г, Let Му and Мл be their mass matrices, we 


MyM, = Г. (6) 
If p^ e HI > о", whose vector of expansion coefficients ø satisfies 
с = Мир, (7) 


~yl 72 73 
will be the representation of o^ in the algebraic dual space 7! BW 
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To explain how the algebraic dual space of the trace space Р! HISP s derived, 


2 73 2 73 
we take pis as example. We already know that pr is a space spanned by 
primal polynomials fho (-DAj(& ^n (ЕЗ) | . With these primal polynomials, we can 
compute its mass matrix, denoted by Мь. The dual polynomials are then computed 
by 


[йоло(—1,#°, 89), «+, o CL 885] 


= [noc Diii, 5, лос һәл) | Ms. 


~y2 73 —— 
The algebraic dual space р! 1 lig spanned by dual polynomials [ioa (21,8 6" | . 


T 1 72 73 Ў 
The algebraic dual space of the trace space P? ,/ ^" eventually can be written as 


еті 72 73 ~y2 3 wy wy ey ey el [2 
Prep ue uP ШВ ОР UP, 


13 


ТЕ 72 73 
The divergence of o^ є T TOL can be done with the help of the boundary value 


A еті Fà 73 ы Й А . 
6^ e P! 1T, With vector proxies, it can be written as 


div o” 2 NT 6" E (g^ (8) 


A detailed introduction of algebraic dual polynomial spaces is given in [9]. 


2.3 Function Spaces in Curvilinear Domains 


So far, all polynomial spaces are defined only in the reference domain 
er gle es = [-1, 1]. Consider an arbitrary domain Q and a C 1 diffeomorphism 
Ф: ет е2 ез > loai x2 хз. In Q, the primal polynomials change. Therefore, the 
mass matrices will also change. But the process of constructing dual polynomials 
does not change. And as we mentioned before, the metric-independent incidence 
matrix E and the matrix N remain the same. The way of converting polynomials in 
Cartesian domain into those in curvilinear domains follows the general coordinate 
transformation process, for example, see [16]. 

From now on, notations mentioned in this section not only refer to the reference 
domain Qef, but also refer to the physical domain ©. 
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3 Weak Formulations 


3.1 Discrete Neumann Problem 


With integration by parts, we can derive the weak formulation of the Neumann 
problem, (1), written as: For given ô € H- P? (8€), find o Е H!(Q) such that 


(grad о, grad ё), + (w, Ф) = (tro, 6), Уфе H'(Q). (9) 


Note that on the right hand side, we use (-, -) to represent the duality pairing between 
tro € H'/2(8Q) ааб Е H7! (ƏN). We use finite dimensional space prr 
to approximate the space H ! (Q) and use the algebraic dual trace space PULL to 
approximate the space Н !/2(99). Then we obtain 


(grad œ", grad a^), = a^ T E'MéE о", 


and 
| tr c o^ dr = (ИЦ АТ a, 
dQ 


which eventually leads to the discrete formulation of (9), 


MEE o" + Mp o^ = NT ô’. (10) 


3.2 Discrete Dirichlet Problem 


For the Dirichlet problem, (2), the weak formulation is given as: For given б € 
H—'/2(9Q), find ø Е H (div, Q), го = 6 such that 


(divo, div õ)z2 + (с, &),2 =0, Убе Ho(div, ©). (11) 


~yl 72 73 
We use algebraic dual space 7! DU to approximate H (div, О). With 6" є 


prr given and (8), we obtain 


(div o”, div 2"), = —g^T EMp (NT 6" — ЕТ о") , 


and 
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Therefore, the discrete formulation of (11) is written as 


MET o + M p o^ = EMPN! 6^. (12) 


3.3 Equivalence Between Discrete Formulations 


Now it is time to check if the equivalence between (1) and (2) holds at the discrete 
level. In other words, it is time to check if the statement that о solves (10) if and 
only if ø” = grad œ” solves (12) is correct. 

From (4) and (7), we know that o^, 


о” = МЕ o, (13) 


15 the vector representation of grad œ" in the dual space. If we insert (13) into (12), 
we obtain 


MpE MSE o^ ММГЕ о" = EMpN! ô”. (14) 
From (10), we know that 
ЕТМЕ о" = —Mp o" + NT ё". (15) 
By inserting (15) into (14), we get 
М» (—Mp af МТ 6") + МАМЕ о" = EMpNT д". (16) 


From (5) and (6), we know that (16) holds, which proves the equivalence. 


If the equivalence holds, relation |е" | = |о*| should also be 
НК) Н (div, Q) 
satisfied. To prove this, we have 
"E E aT h dx glas Tah TO 
lea Per ica (rn Re n? к) 
H (div,Q) 


+ |v 6" — ЕТ (м :a))| f | 6 — ЕТ (мг sa!) | 


10 y 
(10) girl М ; w! dE wT MpMpMp o" 


H! (Q)? 


where we constantly use (5) and (6) and the fact that mass matrices are symmetric. 
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4 Numerical Test 


Consider the mapping which maps the Cartesian reference domain Qreflg1 £2 кз = 
[—1, 1]? into the physical domain Sci x2 3 = [0, 1P by 


x= 


1 1 | ; 
—~+—]&' +c] | sin(é/)], 1=1,2,3. 
aaa П (т) 

j 
When the deformation coefficient c = 0, the domain €2 1s Cartesian. Otherwise the 
domain is curvilinear, meaning that a curvilinear coordinate system parametrizes ©. 
Examples of such curvilinear domains in R? are shown in Fig. 1. 


A manufactured solution of the Neumann problem, (1), is 
xl х2 х3 
@Mexat = € +e +e. 
x X 


т 
Clearly, o exact = grad Wexact = (e E ех, e `) solves the Dirichlet problem, (2). 


In the domains of different deformation coefficient c, with the boundary condi- 
tion б = tr o exact imposed, we solve the discrete formulations (10) and (12) using 
Gauss-Lobatto-Legendre (GLL) polynomial spaces of degree /! = I? = Г? = №. 

The results of the L?-error of (^ — grad a^) are shown in Fig. 2 (Left) where 
we can see that the relation a^ = grad w” is preserved up to the machine precision. 
With the growth of the polynomial degree, the error increases slowly because of 
the accumulation of the machine error as the amount of degrees of freedom grows 


significantly. 
In Table 1, the results of the H!-norm of œ” and H (div)-norm of о” are 


= |o” | holds for all 
H! (Q) H (div, 52) 


polynomial degrees irrespective of whether we use the Cartesian domain, c = 0, or 


presented. It is shown that the relation |е" | 


10 10 

0.8 0.8 

0.6 0.6 

x? x? 

0.4 0.4 

0.2 0.2 

0.0 0.0 - 

00 02 04 06 08 10 0.0 02 04 06 08 10 
zl ті 


Fig. 1 Curvilinear domains for с = 0.15 (Left) апас = 0.3 (Right) in R2. The gray lines illustrate 
the coordinate lines 
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Fig. 2 The L?-error of (s^ — grad a!) (Left) and the p-convergence of the H l-error. of œ” 
(Right) for N = 2, 4,... , 20 and c = 0, 0.15, 0.3 


Table 1 The H!-norm of œ” and Н (div)-norm of c^ for polynomial degree N = 2,4,...,20 
and deformation coefficient c — 0, 0.15, 0.3 


c — 0.15 


И-П 
“|, hla eha 

(div) H H (div) H H (div) 
2 6.7381947027 
4 5.8849807780| 5.8849807780 
6 6.0721137212| 6.0721137212 
8 6.0730525346| 6.0730525346 
10 6.0730648440| 6.0730648440 
12 6.0730653428| 6.0730653428 
14 6.0730653663| 6.0730653663 
16 6.0730653667 | 6.0730653667 
18 6.0730653668| 6.0730653668 


20 | 6.0730653668 | 6.0730653668 | 6.0730653668| 6.0730653668| 6.0730653668 | 6.0730653668 


curvilinear domains, c — 0.15, 0.3. It is also seen that the results always converge to 


the analytical value |oxxacll gi = |o” = — 6.0730653668. The p-convergence 
iv) 


for the H!-error of œ”, therefore also for the H (div)-error of ø”, is shown in Fig. 2 
(Right), which shows the exponential convergence of the method. 


5 Conclusions 


By constructing and using primal polynomial spaces and their algebraic dual 
representations both in the domain and on the boundary, we successfully preserve 
the equivalence of the div-grad Neumann problem and the grad-div Dirichlet 
problem at the discrete level in 3-dimensional curvilinear domains. This suggests the 
further usage of these spaces to structure-preserving methods and hybrid methods. 
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A Conservative Hybrid Method for Darcy ® 
Flow среског 


Varun Jain, Joél Fisser, Artur Palha, and Marc Gerritsma 


1 Introduction 


Hybrid formulations [1, 3, 10] are classical domain decomposition methods which 
reduce the problem of solving one global system to many small local systems. The 
local systems can then be efficiently solved independently of each other in parallel. 

In this work we present a hybrid mimetic spectral element formulation to solve 
Darcy flow. We follow [8] which render the constraints on divergence of mass flux, 
the pressure gradient and the inter-element continuity metric free. The resulting 
system is extremely sparse and shows a reduced growth in condition number as 
compared to a non-hybrid system. 

This document is structured as follows: In Sect. 2 we define the weak formulation 
for Darcy flow. The basis functions are introduced in Sect.3. The evaluation of 
weighted inner product and duality pairings are discussed in Sect. 4. In Sect. 5 we 
discuss the formulation of discrete algebraic system. In Sect. 6 we present results 
for a test case taken from [7]. 
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2 Darcy Flow Formulation 


For Q € R4, where d is the dimension of the domain, the governing equations for 
Darcy flow, are given by, 


и +А Ур = 0 


inQ and р =р оп Гр , 
У.и =f 
и 


-Nn = lg оп Гм 
where, и is the velocity, p is the pressure, f the prescribed RHS term, А isad x d 


symmetric positive definite matrix, р and ûn are the prescribed pressure and flux 
boundary conditions, respectively. 


2.1 Notations 


For f, g Е L? (Q), (f, £)o denotes the usual L?-inner product. 
For vector-valued functions in L? we define the weighted inner product by, 


(ито = f (ә) dQ, (1) 


where (- , -) denotes the pointwise inner product. 

Duality pairing, denoted by (-, -)о, is the outcome of a linear functional on 
L? (Q) acting on elements from L? (Q). 

Let О к be a disjoint partitioning of © with total number of elements K, and К; 
is any element in Ок, such that, К; Е Ок. We define the following broken Sobolev 
spaces [2], H (div; Ок) = П, Н (div; Ki), and НИ? (Ок) = П; НИ? (К). 


2.2 Weak Formulation 


The Lagrange functional for Darcy flow is defined as, 


L (и, pds f) = 5 fo, WTA lu dQk — fo, p(V-u— f)dQk 
+ foo wr, ^ (Cm) аг + fr, Ê (u-n) аг — fry À (йл) аг 
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The variational problem is then given by: For given f є L? (Qx), p e Н!/?(Гр) 
and ûn € H^!" (Iw) find u є H(div; Qx), p € L? (QK), А € Н? (Фк), such 
that, 
(v, u)a-1 o, — (v -v, Plog + ((v |n), Азок\гь = -(v |n, P)r, V v € H (div; Ок) 
- {a V uoy = (a. flo, YqEL? (Qg) 


(u, (и п) дг, = (м, ûn), YuE H? (Ок) 


(2) 


3 Basis Functions 
3.1 Primal and Dual Nodal Degrees of Freedom 


Let £j, j = 0, 1, ..., N, be the N + 1 Gauss-Lobatto-Legendre (GLL) points in 
Ic [- 1, 1]. The Lagrange polynomials Л; (&) through &;, of degree N, given by, 


(@—1) Ly @ 


м (@) = умов) EE) 


form the 1D primal nodal polynomials which satisfy, h;(€;) = 4;;. 
Let a^ and b” be two polynomials expanded in terms of h; (Е ). The L?—inner 
product is then given by, 


1 
(a^, 2 = a'MÜb, where м) = | hi(&) hj (E) dé , 
IJ E < 


апа, а = [ао aj ... ay] and = [bo bi... by] are the nodal degrees of freedom. 
We define the algebraic dual degrees of freedom, à, such that the duality pairing is 
simply the vector dot product between primal and dual degrees of freedom, 


(^.^) aah: aT > do Ma. 


Thus, the dual degrees of freedom are linear functionals of primal degrees of 
freedom. 
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3.2 Primal and Dual Edge Degrees of Freedom 


The edge polynomials, for the N edges between N 4- 1 GLL points (£j-1. £j), of 
polynomial degree N — 1, are defined as [4], 


7-1 dhg 5 
е) = — x. —(£), such that [ eilt) = б}. 
ro 95 8-1 


Let p^ and q^ Бе two polynomials expanded in edge basis functions. The inner 
product in L? space is given by, 


1 
(o'.a") = "Ма, where М0) = f ejas. 
—1 


and, р = [pi pa... px] and q = [ai qo... qu] are the edge degrees of freedom. 
As before, we define the dual degrees of freedom such that, 


(p" q") =F a =p М0 = = МОР. 
A similar construction can be used for dual degrees of freedom in higher dimen- 
sions. For construction of the dual degrees of freedom in 2D see [8] and for 3D 
see [9]. 


3.3 Differentiation of Nodal Polynomial Representation 


Let a^ (Е ) be expanded in Lagrange polynomials, then 


а а 54 у 
Pd (Е) = Е з aihi (Е) = >, (а; — aj-1) ei (E) . (3) 


Therefore, taking the derivative of a polynomial involves two steps: First, take the 
difference of degrees of freedom; second, change of basis from nodal to edge [4]. 


4 Discrete Inner Product and Duality Pairing 


For 2D domains, the higher dimensional primal basis are constructed using the 
tensor product of the 1D basis. 
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For the weak formulation (2) we expand the velocity u^ in primal edge basis as, 


N М N М 
= У У uu; (6) е0) 72+ У You, e) hj. A 


i=0 ј=1 i=1 j=0 


where Uyi, j denotes the flux, f и n, over the vertical edges and иу ij the flux over 
the horizontal edges, see Fig. 1. 


4.1 Weighted Inner Product 


Using (1) and the expansions in (4), the weighted inner product is evaluated as, 


FHH 
6. 


Fig. 1 Discretized domain for К = 3 х 3, N = 3. The blue dots represent the pressure boundary 
condition р, and the blue edges represent the velocity boundary condition ûn 


220 V. Jain et al. 
where, ug, are the degrees of freedom in element K;, and 
T 
hi(&) ej (п) hi(&) ej(1) 
1 i j Е j 
M? = f A^! (E. n) 
Ki V ei) hj) ег (8) hj (y) 


For mapping of elements please refer to [6]. 


4.2 Divergence of Velocity 


Divergence of velocity, У - u^ ‚ 15 evaluated using (3), but now for 2D, 


Уи! = $ Yo Yi Фе + ds oii УУ оцу у e Gh; QD) 


= Уй (ш. = Uxj-1,j t Uy; j — uy, 5-1) ei (£) еј (n) 
(5) 


For pressure we will use dual degrees of freedom. Therefore the weak constraint on 
divergence of velocity is a duality pairing evaluated as, 


h h ~T 12,1 
[a.v at) =} Як. Е” ug, 
Ок i 

Ki 


72 


where E>! represents the discrete divergence operator. It is an incidence matrix that 
is metric-free and topological, and remains the same for each element in О к. For an 
extensive discussion on the incidence matrix, see for instance [6]. For an element of 
degree N — 3, 


© 
© 
© 
© 
© 
| 
© 
о 
о 
о 
о 
о 
о 
о 
о 
о 
о 
о 
| 
o 
© 
© 
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4.3 Connectivity Matrix 


The connectivity matrix ensures continuity of the velocity flux across the elements. 
À is the interface variable defined between the elements, shown as red dots in Fig. 1. 
à acts as Lagrange multiplier that imposes the continuity constraint given by, 


h h ~T ^T т 
(и п) =) йк Мик, =f" Enu , 
дӘк\Гр K B 


where N is the discrete trace operator. It is a sparse matrix that consists of 1, —1 
and О only. For construction of N please refer to [5]. Ем is the assembled М for all 
elements. For, К = 2 x 2, М = 2, Ем is shown in (6). The matrix size of Ем is 
8 x 64, but it has only 16 non-zero entities. It is an extremely sparse matrix that 
is metric free and the location of + valued entries depend only on the connection 
between different elements. 


(6) 


5 Discrete Formulation 


Using the weighted inner product and duality pairings discussed in Sect. 4, we can 
write the discrete form of weak formulation in (2) as, 


= , (7) 
0 
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where, В is an invertible block diagonal matrix given by, 


(1) 2,17 
Ма x, 
2,1 0 
(1) 72,17 
Мак, у 
2,1 0 
В = ; (8) 


(1) 24T 
M, кк z 
2.1 0 
р 


in is as given in (6), X = У, 


1 


‚ аа Е = »; ‚ Where f are the 
f 

Kj Ki 
expansion coefficients of f” (x, y) = Ei fi; ei (x) ej (y). 
(1) 
AW! Kj 
matrix that changes with each local element, K;. Ем is a sparse incidence matrix 
for the global system and IE?! is a sparse incidence matrix for the local systems that 
remains the same for each element. 

Using the Schur complement method, the global system (7) can be reduced to 
solve for A, [1], 


In (8), the mass matrix M is the only dense matrix and also the only 


_ 


А = (Ex an?) . ( uN в) А (9) 


To evaluate А in (9) we need B^! that can be calculated efficiently by taking inverse 
of each block of В separately. This part is trivially parallelized. Once the A is 
determined the solution in each element, K;, can be evaluated independent of each 
other. 

The system (9) solves for interface degrees of freedom between the elements and 
will always be smaller than the full global system. For a comparison of the size of 
A system with full system see Table 1 (for 2D), and Table 2 (for 3D). On the left of 
Tables 1 and 2 we see that, for constant K, increasing the order of polynomial basis 
the growth in size of A system is less than the growth in size of full system. Thus, 
hybrid formulations are beneficial for high order methods where local degrees of 
freedom of an element are much higher than interface degrees of freedom. 
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Table 1 For2D 


N K 
5 400 2280 
10 1600 9360 
15 | 3600 21,240 
20 240 | 6400 37,920 
2 [пз | [0% 10% 


Left: Number of total unknowns as a function of N, for К = 3 x 3. Right: Number of total 
unknowns as a function of the number of elements K, for N — 3 


Table 2 For 3D 


N WW к Uni 
5 1350 0.08 8000 205,200 |0.16 
10 5400 | 0.04 64,000 1,684,800 | 0.16 
15 12,150 | 0.03 216,000 5,734,800 | 0.16 
20 21,600 | 0.02 512,000 13,651,200 | 0.16 
25 33,750 | 0.02 1,000,000 26,730,000 | 0.17 


Left: Number of total unknowns as a function of М, for К = 3 x 3 x 3. Right: Number of total 
unknowns as a function of the number of elements K, for N = 3 


On the right of Tables 1 and 2 we see that, for constant N, the А. system is smaller 
than the full system, although the growth ratio of the size of X and full systems do 
not change significantly. 


6 Results 


In this section we present the results for a test problem from [7] by solving system 
(7). The domain of the test problem is, Q € [0, if. The RHS term is defined as, 


fex = V : (—AV pex), where, 


; 10x? + y? +a (10? — 1) xy 
А = cU ; 
Хуа (107? _ 1) xy x1 + 10-352 +a В 


Pex = sin (2лх) sin (2лу) 


and Dirichlet boundary conditions are imposed along the entire boundary, Гр = dQ 
апа Гу = Ø. We solve this problem on an orthogonal and a curved mesh, see Fig. 2. 

The same problem was earlier addressed in [6], but for a method with continuous 
elements and primal basis functions only. For the configuration K — 3 x 3, N — 6, 
we compare the sparsity structure of the two approaches in Fig. 3. On the left we see 
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Fig. 2 Mesh configuration: К = 3 x 3, N = 6. Left: orthogonal. Right: curved 
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Fig. З Sparsity plots K = 3x3, N = 6. Left: hybrid elements method. Right: continuous element 
method 


the hybrid formulation, and on the right we see the continuous elements formulation 
[6]. The number of non zero entries are almost half in the hybrid formulation, 
66,384, as compared to the continuous element formulation, 117,504. Here, the 
sparsity is due to use of algebraic dual degrees of freedom and is not because of 
hybridization of the scheme. 

In Fig. 4, оп the left we compare the growth in condition number, for the А 
system (9) with full continuous element system, for N = 7 on the curved mesh, 
with increasing number of elements, K. We observe similar growth rates for hybrid 
and continuous formulation, however the condition number for continuous elements 
formulation is almost О (107) higher. On the right we see the growth in condition 


number with increasing polynomial degree for K = 9 x 9 on the curved mesh. A 
reduced growth rate in condition number for hybrid formulation is observed. Thus 
hybrid formulations are beneficial for high order methods. 
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Fig. 4 Growth in condition number for hybrid elements in dark line, and continuous elements in 
dotted line. Left: h-refinement; Right: N-refinement. “с” refers to the curved mesh 
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Fig. 5 L?-error in divergence of velocity: Left: h-refinement; Right: N-refinement. ‘о’ refers to 
the orthogonal mesh and ‘с’ to the curved mesh 


In Fig. 5 we show the L?-error for || V - u^ — f^ ||. On the left side as a function of 
element size, h = 1/4/K , and on the right side as a function of polynomial degree 


of the basis functions. In both cases the maximum error observed is of O (10-7). 


In Fig. 6, on the top two figures we show the error in ће H (div; 2) norm for 
the velocity; and at the bottom two figures we show the error in L? (52) norm for 
the pressure. On the left we have h-convergence plots, and on the right we have 
N-convergence plots. In all the figures, for the same number of elements, K, and 
polynomial degree, N, the error is higher for the curved mesh. 

On the left we see that the error decreases with the element size. The slope of 
error rate of convergence is №, which is optimal for both curved and orthogonal 
meshes. On the right we see exponential convergence of the error with increasing 
polynomial degree of basis for both orthogonal and curved meshes. 
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Fig. 6 Top row: error in H (div; Q) norm for velocity; Bottom row: L?-error in pressure. Left: 
h-refinement; Right: N-refinement. ‘о’ refers to the orthogonal mesh and ‘c’ to the curved mesh 
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High-Order Mesh Generation Based on A 
Optimal Affine Combinations of Nodal gett 
Positions 


Mike Stees and Suzanne M. Shontz 


1 Introduction 


The advantage of high-order numerical methods for solving partial differential 
equations is their higher degree of accuracy compared to low-order numerical 
methods. A major hurdle in the usage of these methods in the presence of complex 
geometries is the absence of robust high-order mesh generation methods [23]. In 
other words, these methods need a high-order mesh that accurately captures the 
features of the geometry to achieve their full potential [1, 10]. 

The typical approach for generating high-order meshes is to transform a coarse 
linear mesh [2—6, 9, 11, 12, 14, 16, 17, 19-22, 24]. At a high-level, these trans- 
formations usually consist of the following three steps: (1) the low-order mesh is 
enriched with additional nodes; (2) the new nodes that lie along the boundary of 
the mesh are moved to the true boundary; (3) the interior nodes are moved based 
on the boundary deformation. The main challenge of these methods arises from 
step (2). In particular, the curving of the elements along the boundary can result in 
invalid mesh elements. With that in mind, these high-order mesh generation methods 
use different approaches in step (3) in an effort to obtain a valid high-order mesh. 
Methods for transforming the linear mesh usually fall into two groups. The first 
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group of methods transform the mesh based on the solution to a partial differential 
equation [3, 12, 14, 24]. The second group of methods are based on optimization of 
an objective function [2, 4, 5, 9, 16—21]. 

In this paper, we describe an optimization-based approach for generating high- 
order meshes based on affine combinations of nodal positions. The remainder of 
this paper is organized as follows. In Sect. 2, we present our new method for high- 
order mesh generation. In Sect. 3, we demonstrate the performance of our proposed 
method on several aerospace engineering geometries. Finally, in Sect. 4, we offer 
some concluding remarks and possible directions for our future work. 


2 High-Order Mesh Generation Based on Affine 
Combinations of Nodal Positions 


In this section, we present our optimization-based method for high-order mesh 
generation. Our proposed method uses affine combinations of nodal positions to 
determine the movement of the interior nodes after deforming the boundary. Our 
method consists of three steps. First, for each interior node in the high-order straight- 
sided mesh, an optimization problem is solved to calculate a set of weights that 
relates the interior node to its neighbors. Second, the boundary nodes are moved 
to the true boundary. Third, the new positions of the interior nodes are calculated 
by solving a linear system of equations using the weights and the new boundary 
positions. In spirit, this method is similar to the weight-based method that we 
proposed in [19] with two major differences. The first difference is that we propose 
an affine combination of nodal positions in this work, as opposed to a convex 
combination. This change allows us to remove the inequality constraint and log- 
barrier term, leaving only the equality constraints. We also propose an alternative 
objective function that when combined with the equality constraints allows us to 
directly solve the optimization problem via a QR factorization. This change results 
in simplified computational complexity and faster execution time. 

To frame our discussion of the method, we introduce the following notation for 
the 2D formulation of the problem; the 3D formulation is similar. Let the x- and 
y-coordinates of the ith interior node be represented as (x;, y;). In addition, define 
the x- and y-coordinates of the vertices adjacent to node i as {(x;, yj) : j € Ni}, 
where N; is the set of neighbors of node i. For each interior node i, this information 
can be represented as the following linear system, where w;; are the weights: 


) шуху = xi 


ЛЕМ 


У`шууј = у, 


ЛЕМ 
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where 
Nj = {high-order nodes of the patch to which i belongs]. 


There are several potential choices for the local neighboring set based on use 
of the low-order nodes, high-order nodes, or both. We include only the high- 
order nodes as neighbors, as only the high-order boundary nodes move during the 
boundary deformations. Using either the low-order nodes or both the low- and 
high-order nodes would dampen the effect the boundary deformation has on the 
interior nodes, which might lead to tangling near the boundary. Including additional 
nodes as neighbors would also result in a less sparse matrix when solving (7). 
Another important consideration is that while the weight calculation is based on 
only the local neighbors, the position of an interior node is indirectly affected by the 
deformation of all the interior nodes through the solution of (7). 

Adding the additional constraint that the weights sum to one results in the 
following linear system Aw = b for finding an affine combination of the x- and 
y-coordinates of the vertices adjacent to node i: 


Wil 
X] X2 ... Xn Xi 
Wi2 
У! Y2... Yn —|Ji |, 
11... 1 1 
Win 
where и = |N;|. Based on the set of neighbors, this linear system will be 


underdetermined (1.е., A = m x n with m < n) in general. If we assume that A 
has full rank, we can find one particular solution to our problem by requiring that 
w has the smallest norm of any solution. This results in the following optimization 
problem: 


min||w||5 (1) 
w 
subject to Aw = b. (2) 


From the Karush-Kuhn- Tucker (KKT) theory [13], we know that the following 
conditions must hold for a solution (w*, 4*) to our problem to be optimal: 

Vu£(w*,A*)20 (3) 

Aw* —b=0 (4) 

A*(Aw* — b) =0. (5) 
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The Lagrangian of our problem is given by: 
L(w, X) = ww — A (Aw — b), 


where A are the Lagrange multipliers. 
Using (3)-(5), we can find the following solution pair (w*, 4*) as follows: 


Vy (ш, А) = 2w — АТА. 


1 
Vu (ш, А) = 0 > ш* = ;4 V. 


1 
Aw*—-b=05 AGATA*) — b = 0. 


А = 2(ААТ)-!р 


1 1 
w* = ;4 V = 54 (AA) 1 = АГ(ААГ) lp. 
Although we have verified that (w*, A*) is a stationary point, we cannot yet claim 
that itis a minimum. To do so, we must investigate X PI A*): 


V2 Lw, A) = 2INi|x|Nil 


VivL(w*, A*) = 21мм. 


From the second-order sufficient conditions, if w* satisfies (3)-(5) and the following 
condition is satisfied: 


zT V2 £(w*, A*)z > 0, forall z € C(w*, A*), z # 0, (6) 


where C(w*, А*) = {z | Vuc(w*)" z = 0} is the critical cone and c(w) = Aw — b, 
then our solution is a minimum. Since V2 f(w*, A*) is symmetric positive definite, 
the inequality in (6) is satisfied for any choice of z. Thus we can conclude that our 
solution w* is a minimum of (1)-(2). 

Now that we have established that w* is our solution, we will discuss calculating 
it via a reduced QR factorization. Suppose that AT — QR, where Опхт, Rmxm 18 
upper triangular, and QT О = Imxm. Substituting in the QR factorization of AT into 
w*, we get the following: 


w* = A*b = Al (AAT)-! p = QR(RT OT QR)! b = QR(RT К) lp 
—QRR IR Tb = QR b. 
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Rearranging this into linear system form, we have: 
R'O'w* =b. 


If we let t = ОГ w*, then RT t = b and w* = Qt. Thus calculating w* involves a 
QR decomposition of АТ, solving the lower triangular system Rt = b by forward 
substitution, and calculating the matrix-vector product Qt. 

After calculating the weights, a boundary deformation is applied. The final step is 
to solve for the new locations of the interior nodes [£;, fz] by solving the following 
global linear system: 


Ахт, $31] = =АвіХв, Ув], (7) 


where Хв and ув are the new x- and y-coordinates for the boundary nodes, and A; 
and Ав contain the weights for the interior nodes and boundary nodes, respectively. 
In this global linear system, each row of the weight matrix corresponds to an 
interior node with nonzero entries for the node's neighbors and zero entries for 
the remainder of the row. The resulting global weight matrix is very sparse with 
irregular structure. In an effort to shift the nonzero entries closer to the diagonal, we 
apply the sparse reverse Cuthill-McKee ordering provided in Matlab. In Fig. 1, we 
show the matrix sparsity plots for the natural node ordering and the updated node 
ordering for the first two examples in Sect. 3. After applying the matrix reordering, 
the linear system is solved using a sparse LU factorization. 


3 Numerical Experiments 


In this section, we demonstrate the results from applying our method to generate 
several high-order meshes. We use Gmsh [7, 8, 15, 21] to generate the initial straight- 
sided high-order meshes. Our method then uses this mesh to calculate the weights 
(step 1). Next, we curve the boundary nodes (step 2) using Gmsh. The positions of 
the interior nodes in the resulting curved high-order mesh are then updated (step 3) 
by our method. For each example, we show the mesh which results from our method 
(with high-order nodes visible), the mesh element distortion for the curved boundary 
mesh generated using Gmsh, and the distortion for the mesh resulting from our 
method. When reporting the mesh distortion, we list the minimum distortion, 
maximum distortion, average distortion computed over all elements (referred to as 
Avgl in figures), and average distortion computed over curved elements (referred 
to as Avg2 in figures). The ideal distortion value is 1, indicating that the element 
is straight. We also list the execution times needed for steps 1 and 3 of our method 
(excluding I/O) in Table 1. The code was run using Matlab R2018a, and the wall- 
clock execution times were measured on a machine with 16GB of RAM and a 
Ryzen 7 1700 CPU. АП mesh visualizations and distortion evaluations were done 
using Gmsh. Our first example is a third-order mesh of a square region around 
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Fig. 1 Sparsity plots for the first two examples in Sect. 3: (a and c) the nonzero entries using the 
original node ordering; (b and d) the nonzero entries after applying the sparse reverse Cuthill— 
McKee ordering 


a NACAO0012 airfoil. In Fig.2a, b, we show the mesh resulting from our method 
and a table of the mesh quality values as measured by the distortion metric. In this 
example, our method increased the minimum distortion from 0.744 to 0.799, while 
causing only minor changes in the average distortion. 

In our second example, we extrude the NACAO0012 airfoil and create a third- 
order mesh of the resulting region. In Fig.3a, b, we show the mesh resulting from 
our method, and a table of the mesh quality values. For this example, our method 
improved the minimum distortion by 0.125, increasing it from 0.317 to 0.442. 
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Table 1 The number of elements and the wall clock times for steps 1 and 3 of our method 
(excluding I/O) for each example 


Execution time (s) 
Example Number of elements | Original ordering | New reordering 


NACAO0012 airfoil 1312 11.51 
Extruded NACA0012 airfoil 13,895 4826.09 
Airbus A319 50,400 13,956.43 


Fig. 2 NACA0012 airfoil 
example: (a) the mesh 
resulting from our method 
and (b) the mesh quality as 
measured by the element 
distortion metric 


Distortion 
Example Min| Max | Avg! | Avg2 
original mesh |0.744|1.00010.999|0.994 
resulting тез |0.799|1.000[0.997 10.997 


(b) 


Fig. 3 Extruded NACA0012 
airfoil example: (a) the mesh 
resulting from our method 
and (b) the mesh quality as 
measured by the element 
distortion metric 


Distortion 
Example Min| Max | Avg1 | Avg2 

original mesh |0.317|1.000/0.997 0.994. 

resulting mesh|0.442|1.000/0.995 10.995 


(b) 


Our third and final example is a second-order mesh of an Airbus A319 aircraft. 
Unlike our previous examples, this geometry resulted in tangled elements after 
curving the boundary. Although our method still increased the minimum quality, 
it was not able to untangle the mesh. To address this, we applied the high-order 
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Fig. 4 Airbus A319 
example: (a) surfaces of the 
mesh resulting from our 
method; (b) a view of a cut 
through the interior volume 
mesh, and (c) the quality of 
the mesh with only boundary 
curving, the quality of the (a) 
mesh resulting from our 
method, and the quality of 
both meshes after applying 
the regularization scheme 
available in Gmsh 


(b) 
Distortion 
Example Min| Мах | Avg! | Avg2 
original mesh —0.878/0.975 0.943 1.000 
resulting mesh —-0.449/0.970 0.970 1.000 
original mesh after untangling | 0.206/0.975|0.945 1.000 
resulting mesh after untangling| 0.211/0.971/0.970|1.000 


(с) 


regularization scheme available in Gmsh as a post-processing step. Aside from 
changing the target Jacobian range to 0.3—2 and fixing the boundary nodes, all other 
parameters were left at their default values. The untangling for the original mesh 
took 14.14s, while untangling the mesh resulting from our method required only 
1.645. In Fig. 4a-c, we show the surfaces of the mesh resulting from our method, 
a view of a cut through the interior volume mesh, and a table of the mesh quality 
values. In Fig. 4c, we list the distortion for the mesh after curving the boundary, 
the distortion for the mesh resulting from our method, and the distortions of both 
meshes after applying the regularization scheme in Gmsh. 

Aside from the third test case, all of these examples were relatively straightfor- 
ward. In each case, our method increased the minimum distortion when compared 
to only curving the nodes along the boundary. While additional testing is necessary 
to confirm this, our results for the third example seem to indicate that our method 
could be used to reduce the severity of the mesh tangling and thus simplify the work 
for an untangling method during post-processing. 
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4 Concluding Remarks and Future Work 


We have presented a new optimization-based method for generating high-order 
meshes. Our examples have shown that the proposed method based on affine 
combinations of nodal positions tends to improve the quality of the most distorted 
elements, while causing minor changes to the least distorted elements. While 
our approach is optimization-based, we have demonstrated that the optimization 
problem can be solved directly using a QR factorization as opposed to the typical 
iterative optimization approach. This change results in lessened computational 
complexity and reduced execution time. 

As part of our future work, we will consider other definitions for the set of 
neighbors of an interior node. We will also investigate other aspects of the linear 
system including other node reordering schemes and solvers. Finally, we will apply 
the untangling method that we proposed in [20] after extending it to 3D. 
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Sparse Spectral-Element Methods for the (9) 
Helically Reduced Einstein Equations Se 


Stephen R. Lau 


1 Introduction 


To model the inspiral and merger of binary objects (blackholes or neutron stars), 
many researchers have been solving the Einstein equations numerically. Such sim- 
ulation involves both the construction of gravitational initial data at time fo and its 
subsequent evolution to a final time tr >> to. Interpretation of experimental detec- 
tions of gravitational waves relies on numerical simulation. Moreover, detection 
of weak signals is facilitated by statistical techniques alongside "template banks" 
of numerically generated signals. We consider a nonstandard problem, solution 
of the Einstein equations reduced by helical symmetry, as described by Beetle, 
Bromley, Hernández, and Price (BBHP) [1, 2]. Heuristically, helical reduction is a 
data+evolution synthesis. Although solutions to the BBHP equations are ultimately 
unphysical, they may approximate the early phase of inspiral and serve as reduced 
order models. Moreover, they may provide excellent “trial data" (the starting point 
for the construction of t = tọ initial data). Finally, they would address bewitching 
mathematical issues concerning exact helical symmetry in general relativity. 

We consider a spectral element approach [3—8] for solving the BBHP equations. 
Although the equations involve a mixed-typed operator L, we solve them via 
relaxation using a Broyden-Krylov approach. The computational domain which 
surrounds the compact objects is split into 11+ subdomains (blocks, spherical 
shells, and cylindrical shells with classical spectral expansions thereon). To rapidly 
solve the linear systems arising in our scheme, we have developed sparse modal 
methods based on the application of spectral integration matrices, extending ideas 
originally described in the 1990s by Coutsias, Hagstrom, Hesthaven, and Torres. 
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We use preconditioned GMRES to solve these systems, with standard domain 
decomposition methods. In addition, we have developed fast methods for inversion 
of subdomain approximations of L, either via modal-based preconditioning or direct 
schemes. 


2 Background 


This section first describes the helically reduced wave equation (HRWE), a model 
for the helically reduced Einstein equations. Our HRWE description fixes ideas. 


2.1 Helically Reduced Wave Equation 


The wave equation is Ow = 0, where O = —d? + Ax. Assume that y rotates 
rigidly with rate (2. With the z-axis as the rotation axis, yw then depends on 
time т only through ф = ф — €t, where ф is the azimuthal angle. Via the 
д, — —*20, replacement, Uy = 0 becomes the НЕМЕ Гу = 0, where 
L = Ag — Q? (x8; — $05)? in terms of co-rotating coordinates (7, X, ў, Z) = 
(t, x cos 521 + y sin (2t, y cos Qt — x sin 2t, z). 

We adopt a “2-center domain” Z, а 3d ball with two smaller 3d balls excised 
from it; see Fig. 1. Its boundary 02 = dS; U д5; U 3 S*, is the union of two inner 
spheres (the —'s) and one outer sphere (the +). We consider the mixed-type problem 


Ly = 0, y = fT оп dS; О д55, (a, — 29, -- r у = ft оп ashy, 
(1) 


C5 B5 
c4 Sy 
c? C B3 3 
С? S; 
С! p! 
үр ж al 
(а) (b) 


Fig. 1 2-center domain decomposition. The left panel depicts the z-cross section of the inner sub- 
domains with ши suppressed. The right panel depicts all subdomains, although for visualization 
the outer radius for Sout has been chosen rather small. In this work the blocks B? and B^ are absent. 
(a) Inner domain decomposition. (b) Double cross section 
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with inner Dirichlet conditions and an outer radiation boundary condition on y. For 
simplicity here, we have put on isa a simple Sommerfeld condition; in practice, 
/* is a nonlocal function of y enforcing an exact Dirichlet-to-Neumann map [4]. 

A class of solutions to (1) stems from Liénard-Wiechert potentials. Indeed, 
consider Ly = —16zy Mó(X — Хр). The “particle” location X, = (+a, 0, 0) is 
the center of either 25у or 3S; ; whence the equation is homogeneous оп Z. The 
mass M and relativistic factor y — (1 — 2) 1/2 are constants, with v = a2 < 1 
so the particle moves subluminally in the (f, x, у, z) frame. The retarded solution to 
this problem is 


TEEPE ing, z) Mid 2M о 
х, у, <) = cosg, p sing, z) =: y — 
+ ИЕ А F vp sin(g + (2A) А R 


Evaluation of this expression involves a numerical component: solution of the fixed 
point equation А. = iz + p? + а? = 2ap соѕ(ф + 2^)| 1/2 for (the retarded time) л. 


2.2 Helically Reduced Einstein Equations 


We consider the vacuum Einstein equations in Landau-Lifshitz form. Write the 
(densitized contravariant) metric tensor as 0°” = у” — А”, where y^" = 
diag(—1, 1, 1, 1) is the flat metric and A" is the metric perturbation. Assume the 
harmonic gauge condition 0,h"” = 0. Then the vacuum Einstein equations can be 
expressed as 


udi Oh ор 92g 


— gitveB | 
в 5 дхё дх®дхВ 


toya B 


(3) 


where a yo В (g) depends оп д^” and its inverse g,,,, (but not on derivatives of either). 
Einsteins equations are then a constrained system of 10 nonlinear wave equations. 

The BBHP reduction of the Einstein equations is similar to the one outlined for 
the wave equation. Technically, it assumes the existence of a Killing vector field, 
but we give a brief and heuristic description of their approach. The challenge is 
that the perturbations А” themselves are not “helical scalars”; helical reduction is 
therefore not tantamount to the replacement L 1^" — Lh”. However, BBHP have 
introduced helical scalars /^ through which the reduction can be carried out. These 
are 


y oem — hit, yor) = J2h', y 00) = Joe + АУУ + h=) 
y 20) = a= + АУУ — 2h**), y QD - eit ( — hi* $ іл) (4) 
y QD = © — р + 1155), y 22) = e21] 


5(h** — А>?) — іл]. 
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These 4 real and 3 complex quantities contain the 10 degrees of freedom in the A"". 
We express this transformation as y^ = gH ya поћи", where A runs over 
the (tensor-spherical-harmonic) labels, and (A) is 0,1, or 2. Contraction of М А иу 
on (3), with subsequent helical reduction based on the action of O оп w4, yields 


Ly^ —2ip (A) 228, ^ + u^ (A) 2^ y^ = 


hP yA, — ip (A) Qh — AR h YA + МА Spr OR kh” в. 
(5) 


Here L is the operator appearing in the HRWE. Similar to the boundary conditions 
appearing in (1), the boundary conditions we adopt for (5) are 


y^ = (f^) on dS; U 855, (9, — 29, +r) 9ey^ = (f^)* on 95+ 


out’ 


(6) 


Again for simplicity, here we have a Sommerfeld condition on 255, but in practice 
use a nonlocal outgoing condition based on an exact Dirichlet-to-Neumann map. 

Price has written down the analog of (2) for linearized gravity, i.e. (5) when the 
right-hand side of the equation is set to zero. This solution may be viewed as an exact 
solution to ОА" = —16л T^", where the stress energy tensor Т” corresponds to a 
massive point particle in an eternal circular orbit (as discussed, such a point source 
is excised from our domain 9). Price's solution is analogous to the electromagnetic 
solution given by G. A. Schott, and it is given by the leading 1/R terms in the 
appendix expressions (12). Unfortunately, д,й"” Æ 0 for this solution. 


3 Sparse Spectral Element Methods 


This section summarizes our numerical methods. It is necessarily impressionistic, 
as even an incomplete presentation of details would take too much space. 


3.1 Overview 


We split Z into subdomains, here with the minimal configuration of 11 subdomains: 
blocks В!-3:5; cylinders C 1.2.3.4,5- an inner shell Sy around "particle" Г; an inner 
shell Syy around "particle" IT; and an outer shell Sout. This corresponds to a “binary 
blackhole” (BBH) domain with two excised inner balls. For a “binary neutron star” 
(BNS) domain, we further split both Sy and Sj; into two overlapping concentric 
spheres (a stellar surface then resides in each overlap [8]), and fill in the excised 
regions with two extra blocks B?^. Figure 1 depicts a BNS domain. Pioneered by 
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Pfeiffer et al. [9], such decompositions are used in the E11ipticSolverofSpEC 
[10]. 

The unknowns in our approach are the modal expansion coefficients associated 
with subdomain expansions in terms of classical (Chebyshev, Fourier, and spherical 
harmonic) basis functions. As described below, we make extensive use of integration 
matrices to achieve sparse representations of the relevant operators. Before pre- 
senting details, we first address how we handle the nonlinearities in (5). Let y^ 
(the vector of unknowns) be a concatenation of the modal coefficients from all 11 
subdomains. Then, as sketched below, upon approximation (5) becomes 


7^ = BEA, (7) 


where Z-Z approximates the operator on the left-hand side of (5), with 28 repre- 
senting the action of "integration preconditioning" (see below) on each subdomain. 
For linearized gravity, with the right-hand side of (5) set to zero, the vector g^ is 
zero, save for select entries related to the inner Dirichlet values (f4)~ of y^ on 
95; U ə Sy. For the full Einstein equations £^ depends on y^, and its evaluation 
relies on spectral analysis/synthesis (forward/backward transform) and numerical 
differentiation on each subdomain. We then view 


vi = (BAIEZ WE) (8) 


as a fixed-point equation, accelerating its convergence with the Broyden algorithm. 

This approach relies on approximation and inversion of the operator appearing on 
the left-hand side of (5). Reference [4] is a detailed account of the case u(A) = 0, 
ie. ће HRWE. For (A) = 1 or 2, the operator mixes the U^ and V^ in ^ = 
U^ -- iV^. We have not yet described our treatment of this scenario, but note that it 
relies on Schur-complement techniques. Here we describe only the и(А) = 0 case. 


3.2 Integration Preconditioning and Other Key Aspects 


We use "integration preconditioning" [11] in order to achieve sparse linear 
systems. Especially for the cylinders and inner shells, the details are formidable. 
We convey the basic ideas with the Laplacian A, rather than L, on a 
cube (suppressing tildes on the co-rotating coordinates). Let u(x,y,z) я 
Уолт) ТО obey Au = g on = [-1, 13. An 
approximation of the Poisson equation is 


(Р; ® 1, ® 1; +1, ® Р ® 1, +1, ® 1„® DĂ = $, (9) 


with ti the vector of Й;ук with appropriate ordering, and D? representing double 
differentiation in the modal Chebyshev basis. Let Boy represent double integration 
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in the modal basis, where the [2] indicates that the first two rows have all zero 
entries. To (9) we apply the “preconditioner” 8 = В?) ® В? ® Boop thereby 
reaching | 


(2 ® ВУ] © Ві + В? 8 I9 Bry + Bro © Bop 8 руй = Bg. (10) 


The system (10) is sparse, and the number of empty rows (all 0’s) equals the number 
of tau-conditions to be enforced, auxiliary equations enforcing, say, u|ac = f. 

“Integration preconditioning” of (9) results in the sparse system (10), but the 
issue of condition number is subtle. Indeed, passage from (9) to (10) arguably 
worsens conditioning. For this reason, integration preconditioning has been viewed 
as bad for PDE; we call the technique integration sparsification. Conditioning 
issues are then surmounted either by further genuine preconditioning on top of the 
sparsification or fast direct solves. Our use of 2-center domains with sparse modal- 
tau methods features: (a) sparse representation of L on subdomains; (b) “gluing” 
of conforming and overlapping subdomains; (c) modal-based preconditioning of 
subdomain solves; (d) “fast” direct-solves on blocks and S; (e) standard global 
preconditioning; (f) low-rank treatment of stellar surfaces; (g) nonlocal domain 
reduction. 


3.3 Current Complexity Estimates 


We aim to solve the linear systems (say approximating the HRWE or the Helmholtz 
equation posed on the 2-center domain 9) arising in our problems at demonstra- 
bly sub-quadratic complexity; indeed, we believe that an order-3 complexity is 
achievable. This is the complexity associated with matrix-vector multiplication; 
despite our sparse representations, the “gluing” of overlapping subdomains in 
our decomposition of 2 prevents realization of a linear-complexity matrix-vector 
product. 

To document progress towards our goal, we summarize our current complexity 
estimates associated with solution of the HRWE on each subdomain type. These 
solves serve as part of our preconditioner for the global GMRES solution of the 
HRWE on Z. For this discussion, let N represent the total number of modes оп a 
given subdomain; e.g. N = (№ + 1)(Ny + 1)(0N; + 1) for the block considered 
above. 

For Sout let M be the matrix which represents r?L (the r? factor here is explained 
in [4]) and includes inserted tau-vectors to enforce boundary conditions. Ignoring 
tau-vectors, M is block diagonal in the spectral space of spherical harmonics 
indexed by (£, m). View its elements as Мек ети, where £mk is a “clumped 
index’ and k, k’ are Chebyshev indices. Then, apart from tau-vectors, M ti! = 
0, unless £ = £' and m = т’. Moreover, each (N, + 1) x (№, + 1) block Memk, emi 
15 itself sparse and banded. While these desirable structures are somewhat spoiled 
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by the tau-conditions, through the use of the Woodbury formula and band solvers, 
for Sout we are able to directly invert M (i.e. solve the HRWE on Sout) at O(N) cost. 

For the inner shells, S; and Sy, our representation М of r?L is only block 
banded. Indeed, the centers of these shells lie off the rotation axis, and r?L mixes 
spherical harmonic modes. Its spectral representation [4] is remarkably complicated, 
and relies on identities for spherical harmonics found in the treatise [12]. We solve 
the HRWE on inner shells via preconditioned GMRES, with a modal block-Jacobi 
preconditioner defined by inversion of the diagonal blocks Memx.emx. Apart from 
tau-conditions, these blocks are again sparse and banded, and therefore amenable to 
the fast methods alluded to in the last bullet. Construction and reuse of this block- 
Jacobi preconditioner therefore has O(N) cost. Moreover, we have empirically 
observed (see [4]) that such preconditioning yields low and essentially resolution 
independent iteration counts. While more analysis is needed, from a practical 
standpoint solution of the HRWE on an inner shell has an O(N) cost. 

The situation on blocks is worse. Part of our global preconditioner for solving 
the HRWE оп 2 involves inversion of the Poisson problem on blocks as an 
approximation to the HRWE (and inversion of the Helmholtz equation when the 
spin index и(А) # 0). This works extremely well, likely due to the fact that 
Q < 1 and the blocks are close to the rotation center. In any case, we solve 
the Poisson/Helmholtz problem on a block via a direct approach [13] based on a 
rank-augmenting generalization of the Woodbury formula. This direct method is 
empirically well-conditioned and low-memory. Moreover, it has an O(N7) set-up 
cost with a small constant, followed by ап О (N*/5) cost for subsequent solves. If 
possible, we hope to reduce the set-up cost to an O(A?/?) complexity. 

The situation on cylinders is worst of all, although to date we have not 
focused much attention on these subdomains. We solve the HRWE on cylinders 
(or the collection of cylinders) via GMRES, with modal-based preconditioners 
that empirically yield resolution-independent iteration counts. Application of the 
preconditioner currently involves an О (Л//3) set-up cost, followed by ап O (N?/?) 
cost for subsequent solves. Here, we believe improvement is possible. 


4 Numerical Tests 


Our decomposition of Z is from Table IV of [4], except here д Su has Fout = 15. 
For that table ðS; has radius ғу min = 0.4 and center (X; = —0.9, 0, 0), and Ə Sr 
has radius ги min = 0.3 and center (Xj; = 1.0, 0, 0). The coordinates X, Y, Z in 
[4] are у, 5, х here. The subdomain truncations (number of modes) adopted here are 
nearly the same as those in [4]; however, here we only record the total number of 
modes over all subdomains. Unless otherwise stated, 52 = 0.075, Му = 0.05, and 
My = 0.1. 
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4.1 Comparison with Exact Solutions 


Our first test uses PriceLG, the aforementioned solution for linearized gravity due 
to Price. Here the solution is a superposition of two point-sources with the above 
masses. Each source point's contribution to the helical scalars is defined by the lead- 
ing 1/R term in (12), and these expressions seed inner Dirichlet conditions on 957 U 
955. The outer boundary conditions are nonlocal conditions which are exact for this 
solution. Table 1 lists errors for y on ); errors for the other scalars are similar. These 
are relative L»-errors (against the exact solution) computed on both B? and Sout via 
interpolation onto uniform reference grids. Since the problem is linear, each line of 
table corresponds to a single GMRES solve performed in parallel on 10 nodes. For 
the table middle about 0.446 of the matrix entries are nonzero. The last table line has 
|д„һ#” |» ~ (3.9420e-12,1.7310e-03,8.9752e-05,9.0076e-16), with the 
rms calculation taken over the (relatively coarse) dual-nodal subdomain grids. The 
first and last components of ð "" vanish for the exact PriceLG solution. 

Our next test is SchwarzH, the Schwarzschild metric in harmonic coordinates: 


h! ——14r?(r- М) (е = М), ҺЁ=т-?М?ъ/\*. (11) 


Here the radius г and the direction cosines у = (sin 0 cos $, sin 0 sing, cos Ө) are 
chosen relative to a point (xo, yo. zo) = (-0.9+1.37е-3,-1.6854е-4,2.9985 
e-3) which is off-center but close to the center of 9 Sr. The mass is M = 0.05, and 
for this choice the horizon of the blackhole lies inside of (but is not concentric with) 
9S; . For this test (2 = 0, and the exact solution (11) seeds inner Dirichlet boundary 
conditions on both 95; and д S}. On д Sd rather than radiation conditions, we adopt 
an inhomogeneous Neumann condition based on the exact solution. Table 2 lists 


Table 1 PriceLG solution 


Truncation Shell error Block error tGMRES iGMRES 
23,114 1.1003e-05 4.6050e-06 0.50e-05 7 
93,067 5.4950е-08 8.0179е-08 0.50е-07 5 

271,197 4.7606е-10 2.7153е-10 0.50е-09 4 

553, 149 1.3446е-12 1.3198е-12 0.50е-11 4 


Here relative L errors are listed only for y (m. The lowest resolution run has zero initial iterate; 
afterwards the initial iterate stems from the previous solution. Respectively, tGMRES and iGMRES 
are the tolerance and iteration number for the GMRES solve 


Table 2 SchwarzH solution 


Truncation Shell error Block error tBROY iBROY 
23,114 6.5414e-05 1.5238e-05 5.0e-05 6 
93,067 1.7162e-06 3.3810e-07 5.0e-07 4 

271,197 9.5944e-09 1.6022e-09 5.0e-09 4 

553,149 1.1266e-11 3.6899e-12 5.0e-11 4 


As for the PriceLG test, only errors for y (m are listed 
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Table 3 Full GR with the same boundary conditions as PriceLG 


Truncation Shell error (GR) Block error (GR) Residual iBROY 
23,114 1.5303e-05 7.6669е-06 3.7429е-07 7 
93,067 1.3194е-07 1.4670е-07 8.3651е-10 6 

271,197 1.2266е-10 2.4074e-10 2.9149e-11 4 

553,149 |---—-— | [----- 4.0790e-13 4 


errors with the same meanings as before. Now ||0,4"" || gs for each и is comparable 
to the corresponding table errors; i.e. the gauge constraint converges to zero. Table 2 
also lists the number iBROY of iterations performed by the Broyden solver to 
achieve the tolerance tBROY. Each Broyden iteration itself involves a linear solve 
via GMRES. Each GMRES tolerance is tBROY/10, the same as the corresponding 
line in Table 1. 


4.2 Gauge Constraint Tests 


Our next two tests explore to what extent the gauge constraint 9, ^" = 0 is satisfied 
for the Einstein problem (5) and (6) in a binary scenario. For the first test we redo 
the PriceLG test, except now with the Einstein equations. The inner and outer 
boundary conditions are exactly as before. Table 3 again lists errors for у”) with 
the same meanings as before. Errors are computed against the finest-resolution 
numerical solution. The table also lists the L2-norm of the nonlinear residual. 
The last table line has ||0,h"" ||» ~ (3.2031e-03 5.1309e-03 4.5881e-03 
4.6711e-03). That is, the gauge constraint does not converge to zero. Since the 
harmonic gauge is not satisfied, we cannot view these as solutions to the Einstein 
equations! 

Presumably, the violation of the harmonic gauge in the preceding example 
stems from the fact that д,й"” + 0 for the Price solution used to fix the inner 
boundary conditions. Beetle, Bromley, and, Hernández, and Price have given a 
refined set of inner boundary conditions, based on the near-field asymptotics of 
a moving Schwarzschild blackhole and meant to improve on the point-particle 
boundary conditions. The appendix lists (our understanding of) these conditions. 
With the hope that these refined boundary will result in lower gauge errors, we have 
performed the previous test with them. Convergence of the numerical solution is 
similar, but ||3,"" || из has comparable size and still does not converge to zero. 


5 Conclusions and Acknowledgments 


Our tentative conclusion for helically symmetric BBH models is that violation of 
the gauge constraint stems from imperfect inner boundary conditions. We have also 
found a persistent gauge error in our BNS models, despite being several orders of 
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magnitude smaller in that context. The BNS model, with stars in place of excised 
regions, involves no inner boundary conditions. We believe that in this context it is 
the outer boundary conditions which give rise to the constraint violation. Likely, at 
both the inner and outer boundaries some of the helical scalars need to be fixed using 
the gauge constraint itself (similar to "constraint preserving boundary conditions" 
used in evolution codes). The author is grateful for correspondence with Richard 
H. Price, and for assistance from UNM's Center for Advanced Research Computing. 


Appendix: Beetle, Bromley, Hernández, and Price Inner 
Boundary Conditions 


This appendix lists expansions for the helical scalars which somewhat generalize 
the ones in [1]. We need two expansions, one for a “particle” at (—ат, 0, 0), and one 
at (аи, 0, 0). Each has its own mass Му п. The top choice of + or = refers to Л 
and the bottom to Г. For both a; and аи, we define v = aQ, y = (1— 02)-1/2, 
R = yA = vypsin(g + 2А), КК = —рсозф + асоѕ(2А) + vyRsin(Qd), and 
К! = p sin gta sin(f2A)-Evy f cos(42A). For A see after (2). Assuming M/R « 1, 
we have 


4M | 7M?) МО — yR? 
(nn) 4,2 [2 МОА YR) р 
A й | RC RA (12a) 
ам | 7M? М?0. — yR)z 
(n0) А, o. [| —— AMET gig Ne ‘oe 
Y 7м?\ M? p2 — 2yaR+ (2+ 12у?) 
poa с Ж E „л 
VIAR R VR 
y OO vy (ам 7M?| м2 -32 -2yXR + 2+ оу?) 
WEAR я NS 
(12d) 
io; [4M , 7M?V | MA- yR(KF iK!) 
aD o diy y 2912. 
aliia d. e = т === (12е) 
АМ 7M? M?z(KF + iK) 
eD ~o. [2 Mz TIR) M 
y | R ш R2 RA (12f) 
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Worried about a possible sign discrepancy with the results in [1], we have also 
considered (12a-g) with all correction terms (those with К in the denominator) 
flipped by a sign. 
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Spectral Analysis of Isogeometric (f 
Discretizations of 2D Curl-Div Problems Chente; 
with General Geometry 


Mariarosa Mazza, Carla Manni, and Hendrik Speleers 


1 Introduction 


In this paper we focus on isogeometric Galerkin discretizations of the weighted 
curl-div operator 


Lapu :=4V хУхи- ВУУ.и, 0 <а, В. (1) 


This parameter-dependent operator appears in several problems, including the 
Stokes equation and Maxwell equations [2]. Moreover, containing a weighting of 
the curl and div operators, it captures the essential features of the so-called Alfvén- 
like operator [14], which is of interest in magnetohydrodynamics [15]. We note 
that До, в can be seen as a weighted Laplacian for vector fields (equivalently, Hodge 
Laplace for 1-forms). Indeed, when a = В = 1, it is equal to the standard (negative) 
vector Laplace operator, i.e., 


УхУхи- УУ-и= —Vu. 


We assume that (1) is defined on a sufficiently smooth domain Q € Е? that can be 
described through a geometry map С : [0, 1]? — ©, and we consider homogeneous 
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Dirichlet (no-slip) boundary conditions, i.e., и = 0 on 29. This leads us to the 
following variational formulation 


(Lagu, v) =a(V x u, V x y) -B(V-u, Vv), uve(Hi(Q). 0) 


We refer the reader to [3, 15] for a discussion about well-posedness. 
To find an approximate solution of the problem Лови = f, with the stated 
boundary conditions, we consider the variational formulation (2) in a finite dimen- 


sional vector space Vj C (Hd (Q))’, 1.е., 
(La gun, Vn) = “(У x Un, V x vy) + В(У - Un, У - vn), Un, Vn € Vn. (3) 


We focus оп isogeometric analysis (IgA) as discretization technique, where the 
approximation space V; is chosen to be composed of vector fields whose compo- 
nents are linear combinations of tensor-product B-splines mapped according to G. 

The discretization (3) leads to solving linear systems, which turn out to be 
severely ill-conditioned and require ad hoc fast solvers for a proper treatment 
[4, 6, 15]. This requires a deep understanding of the spectral properties of the related 
matrices. They depend on many factors: the problem parameters o, В, the basic curl 
and div operators, the mesh-size, the degree of the B-spline approximation, and the 
map G used to describe the geometry of the computational domain. 

In this paper we provide a spectral study of these matrices using the theory of 
(multilevel block) Toeplitz [13, 17, 19] and generalized locally Toeplitz [10-12] 
sequences. More precisely, we show that such matrices admit a spectral distribution 
which can be described in terms of a so-called spectral symbol. We determine this 
spectral symbol and we reveal its dependence on the characteristic parameters of 
the problem listed above. The spectral analysis presented in this paper extends the 
results of [15] to the case of non-trivial geometry and relies on the spectral theory 
developed for isogeometric discretizations of elliptic problems in [7, 8]. We also 
refer the reader to [16] for a spectral analysis of the curl-curl operator. 

The remainder of the paper is organized as follows. In Sect.2 we introduce 
notations and definitions relevant for our spectral analysis, and we recall the basics 
of B-splines. In Sect. 3 we detail the IgA discretization matrices and we perform a 
spectral analysis of them. We numerically illustrate those results in Sect. 4. Finally, 
we conclude the paper in Sect. 5. 


2 Preliminaries 


In this section we collect some preliminary tools on spectral analysis and IgA 
discretizations. In particular, we recall the formal definition of spectral distribution 
for a general matrix-sequence and the definition of (cardinal) B-splines. 
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2.1 Spectral Distribution 


Throughout the paper, we follow the standard convention for operations with multi- 
indices (see e.g. [9, 18]). Given a multi-index n := (п1,..., па) € №, we say 
n — œ ifn; — oo,i = 1,...,d. Let Co(C) be the set of continuous functions 
F : C С with compact support. 


Definition 1 Let f : D — С*** be a measurable matrix-valued function, defined 
on a measurable set D C К“ миа > 1,0 < u4(D) < oo, where uq is the 
Lebesgue measure. Let (A,),, be a matrix-sequence with dim(A,) =: dn and dn > 
oo as п — оо. Then, {Ay}, is distributed like the pair (f, D) in the sense of the 
eigenvalues, denoted by {An}n ^; (f, D), if the following limit relation holds for 
all F € Со(С): 


1 | Mi FAO) y (4) 
D 5 | 


а 

1 n 
lim — F(A;(An)) = —— 
n>% dp 2. В 
where A; (Ag), j = 1,..., dn are the eigenvalues of A, and A; (f), i = 1,..., 5 are 
the eigenvalues of f. We say that f is the (spectral) symbol of the matrix-sequence 


{Ал}. 


If f is smooth enough and the matrix-size of An is sufficiently large, then the 
limit relation (4) has the following informal meaning: a first set of dj /s eigenvalues 
of Aj is approximated by a sampling of àı (f) on a uniform equispaced grid of the 
domain D, a second set of а, /s eigenvalues of A, is approximated by a sampling of 
ә ( f) on a uniform equispaced grid of the domain D, and so on, up to few outliers. 

In general, understanding whether a matrix-sequence admits a symbol and how 
to compute it is not an easy task. On the other hand, any “reasonable” approximation 
of partial differential equations by local methods leads to matrix-sequences that are 
in the so-called generalized locally Toeplitz (GLT) algebra, and so admit a symbol 
[10-12]. The IgA discretization of our curl-div problem (3) fits in this frame. 


2.2 B-Splines 
For p > 0 and n > 1, consider the uniform knot sequence 


Ep = +++ = Epp := 0 < ёрро < +++ < Epen < 1 m раі = = #2р+п+1› 


where £j 544 := L, i = 0,...,n. This knot sequence allows us to define n + p 
B-splines of degree р. Let x; denote the characteristic function on the interval Г. 


Definition 2 The B-splines of degree p over a uniform mesh of [0, 1], consisting 
of n intervals, are denoted by NP : [0,1] > R, i = L...,n + р, and defined 
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recursively as follows: for 1 <i € n + 2p, 

NP) = Xi&.& (9: 
Гог | < К < рапа 1 < <п-2р – К, 


_& DTE we Lee Як —X 


NE ae бык 
Sedes = & Янки — Bil 


м x), 


where a fraction with zero denominator is assumed to be zero. 


It is well known (see e.g. [1]) that the B-splines №, i=1,...,n+ p, forma 
basis, and 


№ (0) = №Р(1) =0, = 2,...,п+р-1. (5) 


The central B-splines № d ,i = p+1,...,n, are uniformly shifted and scaled 
versions of a single shape function, the so-called cardinal B-spline ġp : IR — R, 


got) := Xi. (D. Ф) = 56 we С pet 


More precisely, we have 
№ (х)=фьтх—1+р+1), i=ptl,...,n 
The cardinal B-spline фр is a СР -! function which is locally supported on the 


interval [0, p + 1]. 
Finally, we recall the definition of tensor-product B-splines. 


Definition 3 The оро B-splines of bi-degree р := (рі, p2) over а 
uniform mesh of [0, 112, Сонар of n := (п, по) intervals in each direction, 
are denoted by ү}: [0, 1]? —R,i = 1,...,и + p, and defined as 
р. NP p 
Nj; = Nj SN, 


where 1 := (1, 1) andi := (й, i2) Е №. 


We define the tensor-product spline space SẸ as 
Sf = span [NP :i=2,....0+p—I}. (6) 


Note that all the elements of this space vanish at the boundary of [0, 1р; ѕее (5). 
Hence, ће space incorporates homogeneous Dirichlet boundary conditions. 
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3 Spectral Analysis of Isogeometric Discretizations in 2D 


Suppose that the physical domain © can be described by a global geometry map, 


G := [G1, G3]7, G : о > ©, which is invertible in the parametric domain © := 
[0, 1]? and satisfies G(3Q) = д9. Let 


Уһ = span [фр фо, png =2,...п+р-—11=1, 2] (7) 


pA . | ii p2.- 0 
Pi ip un | 0 | : Pic UC "I : 


and for kı € (ij, jj}, 1 = 1, 2, 


where 


фк, (X1, хә) = Pky ky (GT! (x1, хо) = Pky ky G1 2), — Q0, x2) = GI, $2). 
Then, we set фик = Ne ® Мр, i.e., the tensor-product B-splines in (6). For 


simplicity of notation, we have taken n; = из = n and pı = p2 = p. Also note 
that 


Voki ko = Je) "VINE ® NL) 


962 (мр ү, р _ 902 yP рү 

_ 1 35 (М) 8 М, — э М ® (М) 

n 9G рү р dG yP рү 

де Јс) — 3х. (№) ®М,, F JA Nh e (М) 

where 

396: дс 
._ | OX, 9% 
Ja := | à; acy 
Ox, 0X5 


In the following, we start by discussing the coefficient matrices arising from 
the IgA discretization of a generalized Poisson problem. Then, we construct the 
coefficient matrices related to the IgA discretization of our curl-div problem (3) 
using (7), and we perform a spectral analysis. 


3.1 Matrices Related to a Generalized Poisson Problem 


Let us focus on the following bivariate generalized Poisson operator: 


Leui=—V-KVu, (8) 
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where К : Q — R?*?, and consider homogeneous Dirichlet boundary conditions, 
i.e., и = 0 on д2. From [8] we know that the Galerkin discretization of (8) using 


one component of the space (7) leads to the coefficient matrix AP P defined by 
p.K S p р ут р p 
[556], ; = | [VN a ом? ӨТ Ka VN? 4, ® м? 29| | det(Jg)]., 
where 


Kg := (Ja) | K(G)(Jo) T. 


It has been proved in [8] that such matrices admit a spectral distribution according 
to Definition 1. To this end, let us define 


Н Sp Mp ар @ Ap 
Р ар @ ар Шр®5р 
with 


p 
mp) := фри (p - D + 2 У фор+1(р + 1 — К) cos(k6), 
k=1 


р 
ap(0) = —2 `фФь+1(Р + 1 — k) зщ (40), 
k=1 


p 
870) = bn. (P +1) - 29 145, (p 1— K) cos(k0). 
К=1 
Theorem 1 Let G be a regular geometry тар, i.e., G € C! ([0, 1]2) and det(Jg) Æ 


0 in [0, 1], and let К be a symmetric matrix. Then, the matrix-sequence {AP P hn 
with п = (п, п) is distributed, in the sense of the eigenvalues, like the function 


SEE G, 0) = [11] (| dete ($) Ke £) o H5(0)) ИТ, (9) 


where € € [0, 112, 0 Е [-л,л]2, and o is the Hadamard matrix product. 


We refer the reader to [8, 9] for a detailed discussion about the symbol (9). 
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3.2 Matrices Related to Our Curl-Div Problem 


We can reformulate (1) in 2D as 


92ио — £u 82u5 92 ит 
üxix2 — 9 912 Әх? 
= 2]. Хт 
La, pu =a 921 _ au» B uy Bu |^ (10) 
Ox 1X2 ax? дх1х2 дх2 


where и (xi, x2) := [и (х1, x2), u2 (x1, x2)]7 . When discretizing the weak form (3) 
using the space (7) we arrive at the 2 x 2 block matrix 


p,curl ,4p.curl р.у 4p.div 

APEP y Я: Ало +В Lm Яя 12 

п $ ‚сиг ‚сиг p. div iv 
‚с ЯР | AP 1 AP” di AP” di 
n,21 n,22 n,21 n,22 


The blocks related to the curl-curl operator (V x -, V x -) are given by 


p.curl = aG р Р 0G p p 
Бї |, = L | 35 Улы) ӘМ + GE Мл ® (М, | 
1 


IG aG P P — 
|- = N? 8 Мые Мч ССД | Че СЛс) |” 


0X2 
p.curl m дс р дс 
К " = -f ES (Nj e ® NT a rand 1+1 ® (М) | 


9х2 


9G, (P p 96; мр p 

|- эы (Ман ӘМ + з Nari ® (М; Е 
p,curl = aG p 862 NP 

[ |, = [A ES (NT 8 М1 7 SENT 41 ® Ры] 

1 


дс р dG p 
| (и) ӘМ ЗМР ® (М О] sr 


9х2 9х1 


and a = АР у Note that all those blocks are symmetric matrices. Similarly, 


the blocks related to the div-div operator (V-, У.) are given by (see also (10)) 
p.div _ gp curl p.div _ gp.div __ p.curl p.div _ gp,.curl 
Аи = PET ‚ Fan Яд == n,12 ^ 22 = Ani ^ 


In the next subsection we compute the symbol of the matrix-sequence {АР a Ia- 
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3.3 Spectral Symbol of Curl-Div Matrices AP ri 


We are now ready for the main contribution of the paper: we show that the matrix- 
sequence LARG ap }n admits a spectral distribution according to Definition 1. This 
extends the e computation in [15] to the case of non-trivial geometry. 


Theorem 2 Let G be a regular geometry isa ie, G € C! ([0, 12) and det(Jg) = 
0 in [0, 1). Then, the matrix-sequence {Ае `6 By with n = (n, n) is distributed, т 
the sense of the eigenvalues, like the 2 x 2 matrix-valued function 


f£" ($,0) = af 2", 0) + BEP" Ge, 0), (10) 


where € € [0, 12, 0 є [-л,л]2, and 


p.curl , ^ "m 1 ^ T PEN а " 0 1 
fg @,0):= Табе GUI Jg (X) P H,(0) P. (Ja(X)) , P:— Ё | 


a 6) := |det(Jg DI JG G) Hp 0) (Je&))!. 


Proof From (10) it follows that the block АР т: corresponds to the isogeometric 
x а 


discretization of — . By means of a direct computation we can verify that 


00 
«= | 


ensures that the matrix-sequence {Я Pn dá is distributed in the sense of the 


Theorem 1, with 


eigenvalues like the entry (1, 1) of the matrix te cut! The same argument (using 
a suitable matrix K) can also be applied to the remaining blocks. Then, it can be 
checked that all the considered blocks satisfy the hypotheses of [10, Theorem 5], 
which implies that AUG "d. similar, via a proper permutation matrix, to a matrix 


1 2 such that the matrix-sequence HT P }n has its symbol given by (11). o 


In the context of IgA, the geometry map G is expressed in terms of the same 
B-spline basis as used for the discretization space. However, as can be seen from 
the proof, the spectral result in the above theorem holds for any (smooth enough) 
geometry map. 

Finally, we remark that the p-dependence of the symbol in (11) is completely 
captured by the matrix H, (0). As described in Sect. 3.1 this matrix also appears in 
the symbol expression of a generalized Poisson problem; its properties have been 
discussed in [5, 8]. 
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4 Numerical Example 


In this section we numerically illustrate the spectral results obtained in Sect. 3.3, 
using the same test problem as in [15, Sect. 5]. More precisely, we consider (3) 
defined on a quarter of an annulus, 


Q = {(x1, x2) € R2: r? <x? + x2 < R?, x1 > 0, хә > 0}, r=1, R=4, 
with 


xi = [r + £1(R — r)] cos (2%) 


A л ү, Е, 
x2 = [r + (К — r)] sin (5%) 


G(x, 42) = 


Let us fix n :— (n,n) Е №, р := (p, p) Е № and m є № such that m? = 
n + p — 2. We start by defining two equispaced grids on [0, 1]? and [0, z ]?: 


‚ Bk-9,..m-l1 


Then, we denote by Л; the set of all evaluations of A; we) on Г := {(xj, Ox), 
j,k = 0,..., т — 1} for a fixed i є {1,2}. Note that it suffices to consider 
only [0, л]? because the symbol (11) is symmetric on [—л, л]2, and hence also 
its eigenvalue functions. 

In Fig. 1 we numerically check relation (11) by comparing the eigenvalues of 
яе” with the values collected in A = (A1, A2}, ordered in ascending way, for 
a = land В = 0.1. We observe that, in a complete agreement with the theory, 
the considered sampling of A;( gp ) i — 1,2, describes quite accurately the 


behavior of the eigenvalues of АР n , also for relatively small matrix-sizes, up to 
few outliers. 


5 Conclusions 


We have analyzed the spectral properties of matrix-sequences arising from isogeo- 
metric Galerkin methods for weighted curl-div operators on general planar domains, 
considering a non-trivial geometry map. More precisely, we have shown that an 
(asymptotic) spectral distribution exists and it is compactly described by a 2 x 2 
spectral symbol. In other words, the eigenvalues of the matrices we are dealing 
with can be approximated accurately by a uniform sampling of the two eigenvalue 
functions of the 2 x 2 symbol matrix. The symbol depends on the characteristic 
parameters of the problem and on the geometry of the physical domain. Its formal 
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Fig. 1 Comparison of the eigenvalues of ие" (open circle) with Л = (A1, A2} collecting 


uniform samples of AU, 


i = 1,2 (asterisk), ordered in ascending way, varying both n and 


р, and fixing a = 1 and В = 0.1. (а) p = 3, п = 15. (b) р = 3, n = 35. (с) p = 4, n = 14. (d) 
р = 4, п = 34. (е) p = 5,n = 13. (f) p = 5, п = 33 
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structure nicely mimics the structure of the differential problem. The numerical 
results show a very good matching between the true eigenvalues and the estimates 
provided by the symbol, already for relatively small matrix-sizes. 

The convergence of iterative solvers for linear systems strongly depends on the 
spectral behavior of the corresponding coefficient matrices. Since the symbol gives a 
precise description of the spectrum of the curl-div matrix m , it could be helpful 
in the design of good preconditioners that lead to better performance than current 
solution strategies, like the one in [15, Sect. 5]. 
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Performance of Preconditioners A 
for Large-Scale Simulations Using n 
Nek5000 


N. Offermans, A. Peplinski, O. Marin, E. Merzari, and P. Schlatter 


1 Introduction 


The preconditioning of elliptic problems characterized by the propagation of 
information at infinite speed over the domain is a numerically challenging task. 
We study the case of the Poisson equation arising from the numerical resolution 
of the incompressible Navier-Stokes equations by operator splitting. We consider 
Nek5000, a code based on the spectral element method, as our framework. The 
current preconditioning strategy is based on an additive Schwarz method, which 
combines a domain decomposition method [5] and a so-called coarse grid problem 
[10]. The first step consists in solving directly local overlapping Poisson problems 
and is easily parallelizable. The second step corresponds to a Poisson-like problem 
over the whole domain and is hard to scale because of its relatively low number of 
degrees of freedom and the bottleneck induced by global communication. 

A scalable solver for the coarse grid problem is critical to ensure strong scaling of 
the code. Existing strategies include a direct solution method similar to a Cholesky 
decomposition, called ХХТ [14], and an algebraic multigrid (AMG) solver [6]. 
While the first choice works well for relatively small problems (typically <100,000 
spectral elements on <10,000 cores), the second option is preferred for large scale 
simulations. The current AMG solver, which we will denote as the in-house AMG, 
is fast and scales well [11]. It has been shown that the use of AMG can speed up 
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large-scale simulations by up to 10%. In addition, the ХХТ has been designed for 
optimal performance on a number of cores which is a power of 2, whereas the AMG 
15 insensitive to this parameter. 

However, the AMG solver requires a setup phase, performed once for each mesh, 
by an external and serial code. Besides inducing an unwanted overhead, it also 
limits the use of the in-house AMG solver in the framework of mesh refinement, 
which is the main motivation for this work. Therefore, we propose to replace the 
in-house AMG by BoomerAMG, a parallel AMG solver for arbitrary unstructured 
grids from the hypre library for linear algebra [1, 4, 8]. BoomerAMG offers a number 
of parallel algorithms for the coarsening, interpolation and smoothing steps of the 
AMG setup, to accommodate various types of problems, meshes and architectures. 
The BoomerAMG solver will be tested in terms of scalability and time to solution. 

Scaling tests for the BoomerAMG solver have been performed up to 4096 cores 
by Baker et al. [1]. Matrices arising from the finite element and finite difference 
discretizations of 2D and 3D scalar diffusion problems were considered. The authors 
used HMIS coarsening and extended-+i interpolation and showed that /,-scaled 
Jacobi, /1-scaled Gauss-Seidel and Chebyshev smoothers are good choices for such 
problems. 

Weak scaling up to 125,000 cores has been presented in Ref. [2], where 
BoomerAMG was used as a preconditioner for a conjugate gradient solver. The test 
case considered is that of a 3D Laplace operator. The parameters for the AMG solver 
were again HMIS coarsening, extended+i interpolation and symmetric hybrid 
Gauss-Seidel for the smoother. Aggressive coarsening with multipass interpolation 
was used on the finest grid, while the problem on the coarsest level was solved by 
Gaussian elimination. The authors show the impact of additional parameters such 
as the use of 64 bits for the integers or the use of an hybrid parallel strategy with 
OpenMP and MPI. 

In the present work, we use the BoomerAMG from hypre to precondition a 
GMRES solver for the pressure equation arising from the spectral element dis- 
cretization of the Navier-Stokes equations. We study strong scaling up to 131,072 
cores on two different supercomputers: Mira, based on the IBM Blue Gene/Q 
architecture, and Hazel Hen, a Cray XC40 system. The first test case considered 
is the flow around a NACA4412 airfoil, which we use to identify a set of best 
parameters for the BoomerAMG solver. A second test case is employed for a strong 
scaling study: the turbulent flow in wire-wrapped pin bundles [3, 13]. 

The paper is organized as follows. In Sect.2, we introduce the discretization 
method and describe the preconditioning strategy and the hypre library. In Sect. 3, 
we study which set of parameters gives the fastest time to solution for the problem 
at hand. Using those parameters, we perform a strong scaling study for the flow 
in wire-wrapped pin bundles in Sect. 4. We finish with conclusions and outlook in 
Sect. 5 
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2 Problem Description 


Considering an operator splitting strategy to solve the Navier-Stokes equations, the 
consistent pressure p is the solution to a Laplace problem of the form Др = г. In 
Nek5000, this equation is discretized using the spectral element method [7] and it 
has been shown that it is well preconditioned by an overlapping additive Schwarz 
method. The preconditioner combines local problems RIA: 1 Вк and a coarse grid 


problem ROA 1Во and is expressed as 


K 
M^! = ROA Ro + у RIA "Ак, 
k=1 


where Во and Rx are restriction operators, Ак are local stiffness matrices and К 
is the total number of spectral elements. The matrix Ag corresponds to a Laplace 
operator defined on the element vertices only. Because of its low number of degrees 
of freedom and global extent, the scalability of the coarse grid problem is mostly 
limited by communication and latency. We note that the term “coarse grid” here 
refers to the fact that Ад is defined on the vertices of the spectral elements only. 
The problem is therefore “coarse” in comparison to the solution fields, which 
are expanded on the Gauss—Lobatto—Legendre points inside each spectral element 
(typically order 10 quadrature points in each direction). When talking about the 
different levels arising from the coarsening phase of the AMG setup, we will use 
the term “coarse level" to avoid confusion. 

As mentioned before, the solver of choice for large problems is currently an in- 
house AMG, developed specifically for Nek5000 [6], whose main drawback is a 
setup phase by an external and serial code. The default option for the setup step 
of the in-house AMG is a Matlab code, which uses Ostrowski coarsening with 
norm bound, a diagonal Chebyshev smoother, applied on the second branch of 
the V-cycle only, and an energy-minimizing interpolation, all described in Ref. [9]. 
Other properties of the AMG include no smoothing on the finest level, a number of 
smoothing steps predefined for each level during the setup phase and a coarsest level 
made of one variable only. The good scalability of the in-house AMG is due to the 
fact that it automatically chooses, at run time, the fastest communication strategy at 
each level of the coarsening process, between three options: a pairwise exchange, 
a crystal router method or an allreduce operation. Previous work has shown that, 
when far from the strong scaling limit, the total time spent in the pressure solver is 
typically 85-90% of the total computational time, including the time spent in the 
coarse grid solver, which amounts to about 5—10% of that [11]. 

As an intermediate step in a previous work [12], the coarsening and interpolation 
steps from this Matlab code have been transferred to BoomerAMG, while the 
smoother and the solver were left unchanged. This “hybrid” serial setup was shown 
to significantly reduce the setup time without affecting the rapidity and scalability 
of the in-house AMG solver. 
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In the present work, we completely replace the in-house AMG by BoomerAMG, 
which allows for the whole AMG problem (setup + solver) to be performed 
online and in parallel. The existing code requires only limited modifications. Local 
contributions to the coarse grid operator are built on each process and then handed 
over to BoomerAMG, which takes care of assembling the global operator and of 
communication. If the operator possesses a nullspace, the solution is normalized 
such that the mean of the solution entries is 0. Apart from that, the critical aspect 
of switching to BoomerAMG is the choice of parameters for the setup and for the 
solver that match the performance of the in-house AMG. 


3 Optimal Parameter Selection 


The choice of parameters for the BoomerAMG solver is done by testing a set of 
parameters on a medium-sized test case and looking for the optimal combination. 
We consider the turbulent flow around a NACA 4412 airfoil at Re. = 400,000 
[13] оп a mesh made of 253,980 elements, with polynomial order 11, and we run 
the simulation for 30 timesteps. The best set of parameters is defined as the one 
which minimizes the time to solution for the pressure equation. This is achieved by 
balancing two competing aspects: the accuracy of the coarse grid solution and the 
total number of iterations of the GMRES solver used for the pressure equation. Since 
the AMG is used as a preconditioner, a high level of accuracy is not paramount. 
Yet, it should be sufficient to ensure efficient preconditioning. Based on results 
obtained with the in-house AMG, the initial error on the coarse grid problem should 
be reduced by approximately one order of magnitude. While the in-house AMG is 
designed to ensure that a given reduction in the error is attained at minimal cost, the 
BoomerAMG is designed to ensure the maximum reduction in the error occurs at 
a given cost. Therefore, the best choice of parameters for the BoomerAMG is case 
dependent; here we optimize this choice in the case of large 3D simulations, when 
the use of AMG is most relevant. 

All tests are run on 4096 processors on the Blue Gene Mira at the Argonne 
National Laboratory. We test a total of 96 combinations of the following param- 
eters: 


e Coarsening type: classical Ruge-Stueben (C1), Falgout (C2), PMIS (C3), НМ 
(C4), CGC (C5) and CGC-E (C6), 

* Interpolation method: extended (I1) and extended--i (12), 

e Relaxation type: /;-Gauss—Seidel forward solve on the down cycle + backward 
solve on the up cycle (КІ), /,-scaled hybrid symmetric Gauss-Seidel (R2), 
Chebyshev (R3) and /;-scaled Jacobi (КА), 

* AMG strength threshold: 0.25 and 0.5. 


We assign a letter and a number to each option which will be used to identify the 
method when comparing the results. 
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Table 1 Five best timings for the BoomerAMG with corresponding parameters as compared to the 
in-house AMG 


Run TD. COS ime (9) 
с [mAeeaw- |- |- |- [эз [see [мя 
ь 5433 
pass [os 
d 
e 


1468 | 947 
mro | 9928 
г 75648 [100.13 


3 V-cycles for the ВоотегАМС. Total number of iterations for the pressure solver and timings аге 
reported for 30 timesteps 


Table 2 Five best timings for the BoomerAMG with corresponding parameters as compared to 
the in-house AMG 


Run ID CGS time б) 
a [ищем [| [зэ [655 [us 
а 67552 | 38.09 
683.65 0 
b sae |4031 
f 
е 


nio [мэз 
s0374 [4624 


1 V-cycle for the BoomerAMG. Total number of iterations for the pressure solver and timings аге 
reported for 30 timesteps 


In all cases, we set a relative tolerance of 0.1 on the solution of the coarse grid 
problem and a maximum of 3 V-cycles for the AMG. Moreover, the problem on 
the coarsest level is solved by Gaussian elimination. Total timings and number of 
pressure iterations for 30 timesteps are presented in Table 1 for the in-house AMG 
and the five fastest combinations of parameters for the BoomerAMG. We assign 
a letter to each run for comparison later. The time spent in the pressure solver is 
reported for process О and the time spent in the coarse grid solver (CGS) is the 
maximum value over all processors for a single run. Since an allocation on Mira is 
always made of cores that are physically contiguous and isolated from the rest of 
the network, the timings suffer little noise and uncertainty. We see that the number 
of pressure iterations is on par with the in-house AMG, but the time spent in the 
coarse grid solver is more that three times as much. As a result, the time spent in the 
pressure solver is between 8 and 13% higher. 

To accelerate the BoomerAMG solver, we set the maximum number of V-cycles 
to 1, instead of 3, and perform another test using the optimal parameters. This 
should significantly accelerate the resolution of the coarse grid solver but reduce the 
accuracy of the solution, therefore increasing the number of pressure iterations. The 
corresponding results are presented in Table 2. A surprising result comes from run 
d, where the number of pressure iterations has actually decreased. Since this might 
be the sign of an unstable solver, we discard this choice of parameters. The second 
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best set of parameters corresponds to HMIS coarsening, extended-Fi interpolation, 
I4-Gauss-Seidel forward solve on the down cycle + backward solve on the up cycle 
for the smoother and an AMG strength threshold of 0.5, which we use for the rest 
of our simulations. 

We also experimented with more aggressive non-Galerkin coarsening to change 
the communication pattern on the largest AMG levels and reduce communication 
time. However, this caused a drop of accuracy for the coarse grid solver and an 
increase of iterations for the pressure solver, which led to slower overall timings as 
a result. 

We note that the time to setup the coarse grid problem, which includes building 
the matrix Ад and performing the BoomerAMG setup, is negligible. In the present 
configuration, it amounts to less than a second, whereas a single timestep takes 
about 20 s. Since the setup is performed once at the beginning of the simulation, we 
do not discuss the matter further. Furthermore, it is orders of magnitude lower than 
the serial versions of the setup [12]. 

Further analysis of the results shows that the use of classical Ruge-Stueben, 
Falgout, CGC or CGC-E significantly slows down the solver. Moreover, another 
valid choice for the smoother could have been /;-scaled hybrid symmetric Gauss- 
Seidel, whose speed is on par with our choice. Other relaxation methods, Chebyshev 
and /,-scaled Jacobi, are also consistently slower. Finally, the choice of the 
interpolation method does not have a significant impact and both methods give very 
similar results. 


4 Scaling Results 


As is often the case with numerical simulations, we look for the fastest path to 
solution for a given problem. Therefore, a strong scaling study, where the total 
amount of work is fixed and the number of processes is increased, is a relevant 
measure of the efficiency of the BoomerAMG. This is opposed to a weak scaling 
analysis, where the amount of work per process is kept constant, which is not carried 
out here. 

We consider the turbulent flow inside a reactor assembly made of 61 wire- 
wrapped pins, a configuration appearing in a nuclear reactor core [3]. The mesh 
consists of 1,650,240 elements, uses polynomial order 7 and has a complex, fully 
three-dimensional topology, making it a relevant test case for evaluating precondi- 
tioning strategies. The initial velocity field is turbulent and we run the simulation 
for 10 timesteps. Two series of tests were conducted on two supercomputers: Mira 
and Hazel Hen. The number of compute nodes considered is 512, 1024, 2048, 4096 
and 8192 on the former machine and 256, 512, 1024, 2048, 4096 on the latter one. 
On both computers, the number of MPI processes per node is equal to the number of 
available compute cores, i.e. 16 on Mira and 24 on Hazel Hen. We use our previous 
defined optimal parameters for the setup of the BoomerAMG and we also include 
non-Galerkin coarsening. A drop-tolerance of 0.05 for sparsification is set as default 
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on all levels, with the exception of the five finest levels, which have respective drop- 
tolerances of 0.0, 0.01, 0, 02, 0.03 and 0.04. This choice of parameters is motivated 
by the fact that, unlike with the wing case, the time for the coarse grid solver is 
reduced by about 25%, as tests on 6,144 cores on Hazel Hen have shown, without 
impacting the number of pressure iterations. 

First, let us mention that the setup time for the BoomerAMG solver is once again 
negligible in comparison to the time for the entire simulation, requiring less than 
3s on any number of cores on any machine. It is also significantly lower than 
reading the data of the in-house АМС, a serial process, which takes about 80-90 s. 
Therefore, it does not represent a bottleneck and we do not investigate timings for 
the setup phase in details. 

Next, we present the strong scaling results, based on a single run per core count. 
The reported value for the time spent in the pressure solver is the timing from core 
0. The time spent in the coarse grid solver is measured on each processor and we 
consider the maximum value among all processes. The average time per timestep for 
the pressure solver on Mira is shown in Fig. 1, left plot. Unlike what was observed 
for the wing simulation, the choice of AMG solver does not impact the number of 
pressure iterations. The in-house AMG is slightly faster than the BoomerAMG on 
all core counts and it seems to scale marginally better. Since all other timings are 
the same, the reason is a faster coarse grid solver, as can be seen in Fig. 1, right 
plot. The in-house AMG achieves a better performance because it optimizes the 
communication process independently on each level of the AMG. On the coarsest 
levels, it is able to take advantage of the fast allreduce operation offered by the 
network of the Blue Gene architecture in hardware. On the finest levels, it picks up 
the fastest method between a crystal-router strategy or a pairwise exchange. 

On 131,072 cores, the actual speed up for the coarse grid solver is 3.11 and 
the parallel efficiency is 0.194 for the in-house AMG, when the timings on 8,192 


321 — Lin. scaling 3.95 к — Lin. scaling 
^ —- in-house AMG ^ b —- in-house AMG 
9 18.2 —=_ BoomerAMG 9 —— BoomerAMG 
8 g 
E 10.3 Е 
E E 
2.8 2 
я я 

3.3 
8192 16,384 32,768 65,536 131,072 8192 16,384 32,768 65,536 131,072 
No. of cores No. of cores 


Fig. 1 Rod-bundle test case on Mira, with 1,650,240 elements, leading to 12 or 13 elements per 
core for the largest core count. AMG parameters: HMIS coarsening, extended--i interpolation, l1- 
Gauss-Seidel forward solve on the down cycle + backward solve on the up cycle for the smoother 
and an АМС strength threshold of 0.5. Left: time spent in the pressure solver; value for process 0, 
averaged over the total number of timesteps. Right: time spent in the coarse grid solver; maximum 
value over all processes, normalized by the total number of timesteps 
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Fig. 2 Rod-bundle test case on Hazel Hen, with 1,650,240 elements, leading to 16 or 17 elements 
per core for the largest core count. AMG parameters: HMIS coarsening, extended-+i interpolation, 
1,-Gauss—Seidel forward solve on the down cycle + backward solve on the up cycle for the 
smoother and an AMG strength threshold of 0.5. Left: time spent in the pressure solver; value 
for process 0, averaged over the total number of timesteps. Right: time spent in the coarse grid 
solver; maximum value over all processes, normalized by the total number of timesteps 


cores are used as references. These quantities are respectively 2.44 and 0.153 for 
the BoomerAMG. As mentioned before, the network on Mira is characterized by 
little noise and the uncertainty on the timings is low. Yet, these numbers are based 
on a single run and are only indications. 

The same timings on Hazel Hen are shown in Fig. 2. Unfortunately, the node 
allocation on this machine can be scattered and the interconnect noise, which is 
shared with the rest of the computer, can be high and unpredictable. Therefore, a 
thorough analysis of the scaling results is not possible from a single run and the 
data from Fig. 2 are only indicative. Nevertheless, the runs for each AMG solver on 
the same number of cores are obtained using the same node allocation. This makes 
a comparison between the two solvers on a given core count somewhat relevant. 
In contrast to the results on Mira, the BoomerAMG is slightly faster than the in- 
house AMG on most core counts; the exception is on 6144 cores, where the timings 
are almost equal. Furthermore, it is quite clear that the coarse grid solver on Hazel 
Hen does not scale at all. Indeed, the time spent in the coarse grid solver is almost 
constant from the lowest amount of cores considered. 

Based on the available data, we see that beyond 24,576 thousand cores, the time 
spent in the coarse grid solver accounts for roughly half of the time spent in the 
pressure solver. At that point, there is about 67 elements and 35,000 grid points 
per core, which is consistent with the strong scaling limit on a similar computer as 
identified in Ref. [11]. 
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5 Conclusions 


We used the BoomerAMG solver from the hypre library for linear algebra to solve 
a global coarse problem that is part of the preconditioner for the pressure equation 
arising when time-integrating the Navier-Stokes equations. The set of parameters 
for the BoomerAMG setup that leads to the lowest solver time for the pressure 
equation is HMIS coarsening, extended--i interpolation, /;-Gauss—Seidel forward 
solve on the down cycle + backward solve on the up cycle for the smoother and ап 
AMG strength threshold of 0.5. We also used non-Galerkin coarsening, with more 
aggressive drop-tolerance on the coarser levels, to speed up the solver. This new 
method replaces an existing AMG solver, which is fast and scales well but requires 
a setup phase done externally in serial. Strong scaling was assessed for both AMG 
solvers on a real large-scale test case on two supercomputers: an IBM Blue Gene/Q 
(Mira) and a Cray XC40 (Hazel Hen). On Mira, the in-house AMG leads to a faster 
pressure solver that the BoomerAMG on all core counts. The maximum difference 
15 about 10% оп 131,072 cores. This is because the in-house АМС is able to take 
advantage of the fast hardware allreduce operation on this machine at the coarsest 
levels of the AMG solver; in that sense the present result was expected. On Hazel 
Hen, however, the BoomerAMG is consistently faster than the in-house AMG and 
we observe the strong scaling limit to be reached at about 24,576 cores. Overall, 
the BoomerAMG is a valid alternative to the in-house AMG; both methods are close 
in terms of performance and the BoomerAMG has the advantage to be set up online 
and in parallel. In particular for modern architectures, the BoomerAMG is even faster 
than the in-house AMG with obvious advantages in the setup phase. All the codes 
developed in this work are available from the https://github.com/nicooff/nek5000/ 
tree/amg hypre c Github repository. 

Future work will extend the use of BoomerAMG to mesh refinement, where an 
online and parallel AMG setup phase is a requirement. 


Acknowledgements We would like to thank Aleks Obabko for his advice and for sharing the 
wire-wrapped pin bundle case with us. Financial support by the H2020 EU Project “ExaFLOW: 
Enabling Exascale Fluid Dynamics Simulation" (grant reference 671571), and the Knut and Alice 
Wallenberg Foundation is gratefully acknowledged. This research used resources of the Argonne 
Leadership Computing Facility, which is a DOE Office of Science User Facility supported under 
Contract DE-AC02-06CH11357. Additional computer time was provided by ExaFLOW at HLRS 
Stuttgart, and by resources provided by the Swedish National Infrastructure for Computing (SNIC) 
at PDC Stockholm. 


References 


1. Baker, A.H., Falgout, R.D., Kolev, T. V., Yang, U.M.: Multigrid smoothers for ultraparallel 
computing. SIAM J. Sci. Comput. 33(5), 2864—2887 (2011) 

2. Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Scaling hypre's multigrid solvers to 
100,000 cores. In: Berry, M.W., Gallivan, K.A., Gallopoulos, E., Grama, A., Philippe, B., Saad, 


272 N. Offermans et al. 


Y., Saied, F. (eds.) High-Performance Scientific Computing: Algorithms and Applications, pp. 
261—279. Springer, London (2012) 

3. Brockmeyer, L.M., Sarikurt, Е., Hassan, Y., Merzari, E.: СЕР investigation of wire-wrapped 
fuel rod bundles and flow sensitivity to bundles size. In: Proceedings of the 16th International 
Topical Meeting on Nuclear Reactor Thermalhydraulics (NURETH-16) (2015) 

4. Falgout, R.D., Jones, J., Yang, U.: The design and implementation of hypre, a library of parallel 
high performance preconditioners. In: Bruaset, A.M., Tveito, A. (eds.) Numerical Solution of 
Partial Differential Equations on Parallel Computers, pp. 267—294. Springer, Heidelberg (2006) 

5. Fischer, P.F: An overlapping Schwarz method for spectral element solution of the incompress- 
ible Navier-Stokes equations. J. Comput. Phys. 133(1), 84—101 (1997) 

6. Fischer, P.F, Lottes, J.W., Pointer, D., Siegel, A.: Petascale algorithms for reactor hydrodynam- 
ics. J. Phys. Conf. Ser. 125(1), 012076 (2008) 

7. Fischer, P.F., Lottes, J.W., Kerkemeier, S.G.: Nek5000 (2008). http://nek5000.mcs.anl.gov 

8. Henson, V.E., Yang, U.M.: BoomerAMG: a parallel algebraic multigrid solver and precondi- 
tioner. Appl. Numer. Math. 41(1), 155-177 (2002) 

9. Lottes, J.W.: Towards robust algebraic multigrid methods for nonsymmetric problems. Springer 
theses (2017) 

10. Lottes, J.W., Fischer, P.F.: Hybrid multigrid/schwarz algorithms for the spectral element 
method. J. Sci. Comput. 24(1), 45-78 (2005) 

11. Offermans, N., et al.: On the strong scaling of the spectral element solver Nek5000 on petascale 
systems. In: Proceedings of the Exascale Applications and Software Conference. Stockholm 
(2016) 

12. Offermans, N., Peplinski, A., Marin, O., Fischer, P.F., Schlatter, P: Towards adaptive mesh 
refinement for the spectral element solver Nek5000. In: Proceedings of the DLES11 Confer- 
ence. Pisa (2017) 

13. Vinuesa, R., Negi, P.S., Atzori, M., Hanifi, A., Henningson, D.S., Schlatter, P.: Turbulent 
boundary layers around wing sections up to Rec=1,000,000. Int. J. Heat Fluid Flow 72, 86-99 
(2018) 

14. Tufo, H.M., Fischer, Р.Е.: Fast parallel direct solvers for coarse grid problems. J. Parallel 
Distrib. Comput. 161(2), 151-177 (2001) 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons licence and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter's Creative 
Commons licence, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter's Creative Commons licence and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Two Decades Old Entropy Stable Method (9) 
for the Euler Equations Revisited iss 


Bjórn Sjógreen and H. C. Yee 


1 Introduction, Objectives and Preliminaries 


The two decades old high order central differencing via entropy splitting and 
summation-by-parts (SBP) difference closure of Olsson and Oliger, Gerritsen and 
Olsson, and Yee et al. [2, 7, 25] is revisited. The entropy splitting is a form of skew- 
symmetric splitting in terms of the physical entropy of the nonlinear Euler flux 
derivatives. Central differencing applied to the entropy splitting form of the Euler 
flux derivatives together with SBP difference operators will, hereafter, be referred 
to as entropy split schemes. 

The objective is to prove for the first time, in the recent definition of entropy 
stability based on the Lo-energy-like norm estimate, that entropy splitting for 
central schemes with SBP operators are entropy stable. The proof is to replace 
the spatial derivatives by summation-by-parts (SBP) difference operators in the 
entropy split form of the equations using the physical entropy of the Euler equations. 
The numerical boundary closure follows directly from the SBP operator. No 
additional numerical boundary procedure is required. In contrast, Tadmor-type 
entropy conserving schemes [18] using mathematical entropies do not naturally 
come with a numerical boundary closure. A generalized SBP operator has to be 
developed [8]. Standard high order spatial central differencing as well as high order 
central spatial DRP (dispersion relation preserving) spatial differencing is part of 
the entropy stable methodology. An entropy split scheme satisfies the L2-energy 
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norm estimate readily without an added numerical dissipation term for smooth 
flows. For flows containing discontinuities the Yee et al. nonlinear filter approach 
[10-12, 14, 15, 22-25] is employed at isolated computed locations. After each 
full time step of the entropy split method to suppress spurious oscillations while 
maintaining accuracy on the remaining flow field. Since the nonlinear filter step is 
executed as an Euler time discretization at isolated location after the completion of 
a full time step of the entropy stable central scheme, entropy conservation/stability 
is valid almost everywhere. The efficiency and performance of the entropy stable 
split schemes using the physical entropies are compared with Tadmor-type entropy 
conservative method [18] using mathematical entropies for long time integration 
of a 2D smooth flows and a 3D direct numerical simulation (DNS) of turbulence 
with shocklets. It is found that Tadmor-type entropy conservative methods required 
twice the CPU time than the entropy stable split schemes using the same order of the 
central scheme. Comparisons among the three skew-symmetric splittings (entropy 
splitting [19, 20, 25], Ducros et al. splitting [1] and the Kennedy and Grubber 
splitting [5]) on their nonlinear stability and accuracy performance without added 
numerical dissipations for smooth flows is included. See [16] for additional details 
and comparison. 


Remarks It is noted that the Hughes et al. formulation [4] using the Harten’s idea 
[3] but solving the flow equations in nonconservative form in terms of the entropy 
variables is completely different from the entropy split schemes. The entropy split 
scheme solve the entropy splitting form of the Euler flux derivatives consisting of 
a one parameter family of conservative and a non-conservative portions in terms of 
the entropy variables. If the parameter satisfies the energy estimate, entropy stability 
is immediate. The entropy split scheme has been generalized from a perfect gas to 
a thermally perfect gas and gas flows consisting of linear combination of perfect 
gases [21, 25]. In addition, these high order schemes have been formulated in time 
varying deforming curvilinear grids with free-stream preservation [17, 21]. 


2 Entropy Splitting of the Euler Flux Derivatives 


We consider the 3D equations of inviscid compressible gas dynamics 
qr +В +; + hy =0 


with conserved variables q = (р pu pv pw e)” and fluxes in an arbitrary direction 
К = (kı k2 Кз) with |k|* = 1, and 


f= kıf + 09 + kah = (ой рий + kıp рой + Кор оши + k3 p û(e + B» (1) 


where й = Али kov -- kw. The total energy is related to the pressure p by the ideal 
gas law, e — РТ, where у > lisagiven constant, and lu? = и2+02++ш2. 
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An entropy is a convex function, Ё (4), of the conserved variables that allows an 
additional conservation law, 


E, + Fy + Gy + Н; = 0, (2) 


when the solution is smooth. The entropy fluxes in the x-, y-, and z-directions are 
denoted by F, С, and H, respectively. The entropy variables are defined by v = 
УаЕ (the notation Eg for the gradient will sometimes be used). The convexity of 
E ensures that these are well-defined. The Entropy conservation law (2) follows if 
the relation vii = VqF for the x-direction fluxes, and similarly for the y- and z- 
directions, holds. Moreover, the entropy variables symmetrize the equations; df/dv 
is a symmetric matrix. 
Harten [3] considered the class of entropies 


yta x cds 
Е = Typo Fee. (3) 


where о is a parameter. To ensure that E is convex, i.e., that the matrix Eqq is 
positive definite, o is required to satisfy о > 0 oro < —y. The full range for a 
was given in [25], while [3] only considered o > 0, and [2] used only the special 
case а = | — 2y froma < —y. The corresponding entropy flux in the direction 
К = (Ку Ко k3)! is 


Е = ПЕ. 
The entropy variables у = Eq are straightforwardly found to be 


E. a 
v= P RED Е 


1 
- |ц2иош – 1)", (4) 
р у-1р 2 


where s denotes рр 7. The conserved variables are homogeneous functions of ће 
entropy variables (4), 


q(0v) = 6’ q(v), (5) 


where В = (o 4- y)/(1 — y). From (5) it follows that 


чуу = fq (6) 
фу = pf. (7) 


See [3, 16] for the proof. The range of a, where Eq.q is positive definite, translates 
to В satisfying В < –-2т ог В>0. 
Entropy splitting of the Euler flux derivative in the x-direction with the y- and 


z-directions suppressed [2, 25] is written as a weighted sum of a conservative part, 
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f,,, and a non-conservative part, f,v,, as 


Replacing f, by this split flux derivative gives 


p 
——f, f,v, = 0. 8 
+ Efe + Sh (8) 


ive entropy splitting weights the non-conservative portion of the flux derivative by 
Ts + B This means that the range В > 0 corresponds to a weight that is less than 
1, whereas negative В leads, unphysically, to a weight that is greater than 1. The 
global entropy conservation can be rewritten as an L2-like estimate. The entropy 
time derivative can be rewritten as 


2 1 T(E 
E(q) = a (Eq) у) 


by using the homogeneity (5). Due to a page limit, see [16] for further discussion. 
Note that it is necessary to bound the eigenvalues of Ej in order to make the L2- 
like norm a valid estimate. 


3 Semi-Discrete Entropy Split Discretization of the Euler 
Equations 


Consider the 1D compressible gas dynamic equations discretized on a domain а < 
x < b by a uniform grid x; = (j — l)Ax +a, j = 1,..., N, and grid spacing 
Ax = (b — a)/(N — 1). Define the semi-discrete entropy split approximation 


d 
—q; + — Df fy); Dv; = 0, i= 1,..., N, 9 
aut Ba i+ oy У; Л (9) 


where D is а SBP difference operator. With entropy split scheme, we will always 
mean the entropy split form of Eqs. (8) discretized in space by a summation-by- 
parts finite difference operator. The flux Jacobian matrix with respect to the entropy 
variables, fy, is symmetric. The SBP scalar product is denoted by 


N 
T 
(u, у)„ = Ax X oju; Vj, 
j=l 
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where w; > 0 are weights that are different from 1 only at a few points near the 
boundaries. The operator D satisfies the SBP property 


(Du, v); = —(u, Dv) — uj vi + ul vw, (10) 


but is otherwise arbitrary. In the most common case D is a standard SBP centered 
difference operator, but other operators are possible. 

A zero velocity, и = 0, им = 0, boundary condition is enforced, corresponding 
to wall boundaries. Thanks to the SBP property of the difference approximation the 
derivation of entropy conservation for the continuous problem can be carried over 
to the discretization. 


Theorem 1 The approximation (9) together with the boundary conditions иј = 0 
and им = 0 conserve the global entropy in the sense that E X n cj Ej = 0. 


A method is entropy dissipative, or "entropy stable", if the computed solution 
satisfies (2) with inequality, 


Proof Denote 


_ В 
Po тол Дт 


The scheme (9) can be written 


(у, (fy) DV). 


d 
279) = Рг;, (11) 


where the projection P sets иј = 0 and им = 0. Because P? = Р, applying P to 
both sides of (11) gives that 


i.e., that Pq = q if the initial data satisfy the boundary conditions. For the entropy 


d 
ao (у, 92)» = (v, Рг)һ = (v, r)a — (v, (1 — Р)г)һ = 


(v, г)и — (Ру, (Г — Р)ғ)һ = (v), — (12) 


where we use that Ру = у. This is due to the second component of v is zero when 
the x-velocity, и, is zero, and the orthogonality (Pv, (J — P)r); = 0. The entropy 
equation is now of the same form as for the continuous problem, but replacing with 
integration-by-parts by summation-by-parts gives 


d 
—E(q;)=—F, Fi. 
di (qj) N+ FY 
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Entropy conservation follows by observing that Е = uE, so that the boundary 
conditions imply that Е = Fy = 0. 


If the boundary conditions are periodic, no SBP modification of the difference 
operator is needed. Entropy conservation is proved with periodic boundary condi- 
tions by direct application of the same technique as above. It can be shown that 
the result carries over directly to the semi-discrete approximation, since only time 
derivatives are used in the proof. Hence, the L?-like estimate 


а N 
—1 
dt У }Фууу(Ев,а); У; = 0 
j=l 


is obtained for the approximation (9). It can be shown that Tadmor-type entropy 
conservative discretization using the Harten entropy and high order central spatial 
differencings are also entropy conservative methods. See Sjógreen and Yee [16] for 
the proof. 


4 Numerical Experiments 


More extensive numerical experiments are reported in the extended version of 
this paper [16]. Previous studies using SBP boundary closures for non-periodic 
boundary conditions can be found in [25]. Here selected summary results are 
presented. 


Test Case 1: 2D Compressible Euler Simulation of Smooth Flow: Isentropic 
Vortex Convection 

The compressible Euler equations in two space dimensions are solved with initial 
data 


(Y= DB? лаз, 


p(x, y) = (1 - aud (13) 

u(x, y) = Ugo — B = Yo) 1-22 (14) 
2л 

v(x, y) = Uoo + BQ = xo) а—2)/2 (15) 
2л 

р(х, у) = р(х, y)”, (16) 


where r? 2 x? + y?, В =5, у = 1.4, ио = 1, and ve; = 0. The exact solution is 
the initial data translated, u(x, t) = uo(x — ucot, y — Voot). 

The computational domain is 0 € x < 18, 0 € y < 18 with periodic boundary 
conditions. The center of the vortex is chosen to be (xo, yo) — (9, 9). The problem is 
solved in time with the classical fourth-order accurate explicit Runge-Kutta method 
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to time г = 72, which corresponds to four revolutions of the vortex across the 
domain. 

Comparisons of high order classical central split schemes with high order DRP 
schemes with grid refinements are reported in [13]. Due a space limitation only 
one grid with maximum and L2 error norm compared with the exact solution 
is shown in Fig.1. Here C08-DS represents eighth-order central differencing 
applied to the Ducros et al. splitting form of the Euler flux derivatives. The 
corresponding eighth-order entropy splitting, entropy conservative method and 
Kennedy Grubber splitting are indicated by “С08-Е5”, “С08-ЕС” and “C08-KGS”. 
If the computed solutions by *C08-DS", “С08-Е5”, “C08-EC” and “C08-KGS” are 
nonlinearly filtered by a dissipative portion of WENO7 (seventh-order weighted 
essentially nonoscillatory spatial method) with an adaptive flow sensor, they are 
indicated by C08-DS--WENOTFI, C08-ES+-WENO7FI, C08-EC4-WENOTEFI, and 
C08-KGS--WENOTFI [14, 15, 22-25]. For the smooth flow without any turbulent 
structure, В = 1 for the entropy split scheme. The В parameter studies are reported 
in [9, 16, 25]. In general, for compressible shock-free turbulence and turbulence with 
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Fig.1 Inviscid 2D compressible vortex convection with 100? grid points: comparison of 
maximum-norm of error vs. time for CO8-DS, C08-ES, С08-ЕС, and C08-KGS (left, top), and C08- 
DS+WENO7FI, C08-ES4-WENOTFI, CO8-EC+WENO7FI, and C08-KGS-- WENO7fFI (right 
top). Bottom left and bottom right are the corresponding L2-norm of error vs. time 
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shocklets, В lies somewhere in the range 1.5 < В < 2.5. In general, the optimal В 
is problem dependent. A general conclusion is that В should not be very large ог 
below 1. 

Other high resolution dissipative shock-capturing methods are also candidates 
for the nonlinear filter approach as well as other optimal WENO or ENO methods. 
However, with a good control of the numerical dissipation away from disconti- 
nuities, there is no need to use the more complicated and more CPU intensive 
shock-capturing methods. The non-split C08 without any added numerical dissipa- 
tion diverges shortly after time evolution. Results by WENOS or WENO7 are very 
diffusive with large maximum ог L2 errors. For this smooth long time integration 
flow, entropy splitting is the most accurate method. 


Test Case 2: 3D Isotropic Turbulence with Eddy Shocklets 

The second numerical test problem computes decaying compressible isotropic 
turbulence with eddy shocklets. For high enough turbulent Mach numbers weak 
shocks (shocklets) develop from the turbulent motion. Here the initial turbulent 
Mach number 15 0.6. The Navier-Stokes equations are solved using y = 1.4. 
The computational domain is a cube with side length 2z and periodic boundary 
conditions in all three directions. The initial datum is a random divergence free 
velocity field, и; o, i = 1, 2, 3, that satisfies 


3 4 1 оо 
2 rms = 5 (010, ио) = E(k) dk 
0 


with energy spectrum 
E(k) ~ khe 2/40)” 


The computations were made with u;ms59 = 1 and Ко = 4. The angular brackets 
denote averaging over the entire computational domain. The density and pressure 
fields are initially constant. The Taylor-scale Reynolds number, Re; о, is 100. See 
[6] for definitions of the quantities and more details about the set up of the problem. 
The simulation is run to the final time 4. 

Figure 2 shows the comparison of two splitting methods (DS and KGS), ES 
(entropy splitting and entropy stable) and EC (entropy conservative) using the same 
nonlinear filter. The time evolution of the domain averaged kinetic energy (upper 
left), enstrophy (upper right), temperature variance (lower left), and dilatation (lower 
right) are compared. All four forms of the nonlinear filter method provide similar 
resolution. All four schemes without the nonlinear filter are stable but not as accurate 
as the nonlinear filter versions. Over all, DS splitting is slightly less CPU intensive 
than ES. KGS skew-symmetric splitting is more CPU intensive than DS and ES. The 
EC method is around two times more expensive than DS. In addition, as the order 
of these methods increases, the gain in efficiency (CPU) by entropy split schemes 
increases. 
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Fig. 2 3D Isotropic turbulence problem with 643 grid points. Comparison of two splitting method 
(DS and KGS), ES (entropy splitting and entropy stable) and EC (entropy conservative) using the 
same nonlinear filter. Evolution of kinetic energy (upper left), enstrophy (upper right), temperature 
variance (lower left), and dilatation (lower right) DNS computed on 2563 grid points and filtered 
down to 643 resolution is considered as the reference solution 


Although entropy split methods are not in conservation form but entropy 
conservative, Sect.4 showed that they perform well on problems with shocklets. 
Over all, Extension of the entropy split scheme to other equations of state (non- 
perfect gas) and the MHD can be found in the original 2000 Yee et al. [25] paper. The 
entropies (3) can be used to construct entropy conserving schemes in conservative 
form. See [16] for the derivation. 
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A Mimetic Spectral Element Method A) 
for Free Surface Flows scs 


L. Nielsen and B. Gervang 


1 Introduction 


In the last decades, CFD simulations of free surface flows have become a key tool 
in engineering analysis in the design of marine structures. To be able to obtain 
valid estimates of environmental stress on ship-wave hydrodynamics, offshore wind 
turbines, wave energy converters, and offshore production systems the CFD tools 
need to be able to account for non-linear wave-wave and wave-body interaction. 
Traditionally free surface water simulation has been simulated using lower order 
methods, however recently spectral element methods have been used [2]. In contrast 
to earlier work, in the present article, we simulate 2D free surface waves using a 
mimetic spectral element method. This ensures that the invariants of the system 
mass, momentum, and energy are conserved throughout the simulation. 

The governing equation for incompressible, Newtonian fluids is the Navier- 
Stokes equation. Free surface waves can be assumed to be governed by an inviscid 
and irrotational fluid flow. Assuming first the fluid to be inviscid we arrive at the 
Euler equations, 


ди 
p eru ҮЧ = —Vp- pg. 


together with the continuity equation 
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Using the vector identity ivu -u) = (и. V)u +u x (V xu) and using that the fluid 
is irrotational (V x u = 0), we can rewrite the momentum and continuity equations, 


Уф 0 2 

Уј = 1 
ШЕТ. dbi p= pg. (D 
у2ф = 0, (2) 


where ф is a vector potential defined аз и = Уф, ф = ф(х, 2, 1). We can now 
rewrite the momentum equation as, 


9 
V l^ + PIVAR + p+ ов: | = 0, 


which we can integrate in space to obtain the time dependent Bernoulli’s equation. 


pe ФР + p+ pgz = СО), 

ot 2 

where C (t) is an arbitrary function of integration. We assign C(t) = 0 by recalling 
that ф and ф + f C (t)dt yield exactly the same flow. Redefining ¢ and retaining the 
symbol ф = $ + f C(t)dt we obtain the time dependent Bernoulli's equation for 
the problem as, 


pot + Pv + p+ рва — 0. (3) 
ot 2 

The governing equations for inviscid and irrotational flows for an incompressible 
fluid are stated through (2) and (3), where the unknowns are the velocity potential, ф, 
and the pressure, p. Equations (2) and (3) together with proper boundary conditions 
constitute a well-posed problem. The velocity potential, ф, can be solved from the 
Laplace equation and then substituted into the Bernoulli's equation to obtain the 
pressure field. 


1.1 Boundary Conditions 


The physical domain is shown in Fig. 1, where the notations are also illustrated. 
The fluid domain 2 C IR, d = 2 is a bounded, connected domain with piecewise 
bathymetry Г? C В4-!. The time domain is taken as T : t > 0. The unknowns for 
the problem become the velocity potential and the free surface elevation n(x, t) : 
TES x T — В. The pressure can hereafter be determined through (3). 
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r*5 


+ 


Fig. 2 Computational domain 


The unsteady kinematic and dynamic free surface boundary conditions are given 
by Zakharov [8], 


an = —89,58,ó + 5 + дд.) Ее ГЁ xr, (4) 


E 1 " - 
дф = —gn — 5 (Orb) —v2(1+dndxn)) e ГЁ xT, (5) 


where ^ signify functions defined only on the free surface. The vertical component of 
the velocity v = 0,@|z=y is calculated by solving the Laplace problem (2) together 
with the Zakharov boundary conditions (4) and (5) on the free surface. On the 
bottom we have the no penetration condition, 


9.ф + 8,h8, = 0, for = —h(x) on I. (6) 


On the inlet and outlet boundaries (Г\ГЁ5 U ГР) the gradient of the velocity 
potential is specified. The computational domain is shown in Fig. 2. 


2 Discretization of Governing Equations 


The developed method adopts elements from differential geometry. The unknowns 
of our system are described by use of differential forms. In a three-dimensional 
setting we are making use of four types of sub-manifolds: points, curves, surfaces, 
and volumes, both as inner and outer oriented objects, see an example in Fig. 3. The 
mimetic spectral element method uses an approach similar to the Galerkin method of 
the finite element method where the numerical residual is weighted by an arbitrary 
weight function. In contrast to the traditional finite element method the arbitrary 
weight functions are taken from the dual space of the function space used by the 
unknowns. 
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outer 


inner 


Fig. 3 Three-dimensional dual De Rahm complex showing the four types of sub-manifolds and 
their different orientations 


2.1 Basis Functions 


For the polynomial representation we use Lagrange polynomials /;(x) and edge 
polynomials е;(&), see [5]. The Lagrange polynomials are based on a Gauss- 
Lobatto-Legendre (GLL) point distribution for the nodal values. The Lagrange 
polynomials and edge polynomials satisfy the properties, 


an fi ifizj on fi fi=j 
мео ТЕ [+= ifi j, 


J 


and the edge polynomials are explicitly given in terms of the nodal Lagrange basis 
functions /; (x) as 


1—1 
ei) = — Уа), (7) 
К=1 


where dl, (&) is the exterior derivative applied to the 0-form + (£). This definition of 
the edge polynomial also implies, see [4] and [5], 


dl; = ei — е. (8) 
2.2 Mimetic Discretization in 2D 
If we let the 0-form gO Е A°(M) be expanded as 


N 
oP = у ФКК, (9) 


i,j=0 
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then we can write el as a matrix-vector product 


9 = [Le r]$ 2 M? . 9, (10) 


where L; j = l; (€;) and &; are the Gauss-Lobatto-Legendre points, GLL points. 
If we let the 1-form и) Е A! (M) be defined as 


u = u? d£ + и” dn, (11) 


we can expand u5 and и” using edge polynomials as, 


N N 

u=) у и aL, (12) 
i=1 j=0 
N N 

и = У Уи 10е 0). (13) 


i=0 j=1 


The discrete one-form и) can also be written as a matrix-vector product, where u 
is evaluated in the GLL points, 


ay_|[L@E] 9 | || uo. 
Hh -| 0 Tel ME 5 ча) 


where E; j = e; (€j). 
The 2-form Р) є A?(M) is expanded using only edge polynomials, 


N 
p? = У) раб ео) = [EG E]. p - MO .р. 2n 
i,j-l 


The Laplace equation can be reformulated using a mixed formulation, see [1], where 
the equilibrium equation and the constitutive relationship are separated into two 
equations. 


Уф=и, V-u=0. (16) 

Writing (16) using differential geometry for а 3-D geometry we obtain, 
аф =u, (17) 
dq® = 00), (18) 


q? = xu”, (19) 
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where we have utilized the Hodge star operator. The Hodge star operator is a map, 
which maps p-forms onto (n — p)-forms, where n is the dimension of the domain, 
Q. Given a p-form, AP), the hodge star maps as follows: 


xA (P 002") = ХР) (qon), (20) 


where ^ denotes the change of orientation of the new form. The Hodge star is also 
the coupling between the outer oriented domain and the inner oriented dual space, 
as seen in Fig. 3. 

In 2-D, using differential geometry, equations (16) take the form, 


GO mu. uu 0. ağ =, Q1) 


When the exterior derivative is applied to the balance equation of (21) we obtain, 
see [5] 


ag = e$» ча, = 4); DeiDejQ), Q2) 
i,j=1 


where we have utilized (8). The equilibrium equation, the first equation in (21), is 
equated to a zero valued 2-form. Expanding the last equation in (21) yields, 


N N 
Y ле = Y a -alatah а рее), 03) 


i,j=l i,j=l 
where fj, ; = 0. The basis can then be cancelled and we can rewrite (23) as, 
f = Eq, Q4) 


where E) is an incidence matrix, only consisting of 0, 1 and —1. This matrix 

relates the fluxes of q to the volume integral of the balance equation, see Fig. 4. 
The first step in developing the discrete system is the formulation of the weak 

form, where we make use of duality pairing between an arbitrary k-form, 0, and 


Fig. 4 Three-dimensional 
representation of surface 
fluxes making up the 
divergence of a volume 
integral 
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an arbitrary (n — k)-form, В"-Ю. The duality pairing is defines as, 
(a^^. gem. = [ «© a ERA (25) 


The pairing with the (n — k)-form, ВЮ, takes the role of a weight function in 
traditional finite element analysis and lives in the dual space and carry the opposite 
orientation. The result of duality pairing can also be represented as a matrix-vector 
product, 


B! . MOPOT. WwW.J.M® a = p7 - M? a, (26) 


where W contains the Gauss weights and J is the Jacobian matrix. M is а mass 
matrix of the corresponding discretized k- and (n — k)-form pairing, and () denotes 
a matrix of opposite orientation. 

Using Stokes generalized theorem [3] and applying integration by parts to the 
balance equation (the last equation of (21)), we obtain. 


[ dj ла® = [ d (a A a”) = f 4) ^ do (27) 
2 о 2 


RT (4 лао) | 40 ^ do, (28) 
02 2 


Using duality pairing, an inner product projection for the term with the Hodge star 
operator, the expansions in (9)-(15), and appropriate boundary conditions we can 
set up the matrix system for the discrete Laplace operator as shown in (29). 


0 0 ECOTMO 
0 MD MO 
MMO ЕЧ.О) MO 0 


(29) 


ея © 
| 
ccc 


Using the forward Euler scheme for the temporal term and pairing it with an 
arbitrary 0-form, &® , we can rewrite the Bernoulli's equation as, 


D —ф 1 
(и ART Р А (аю aot") + pO = –р? лв gy, a) 


2 
(30) 


The density is considered a 2-form, which leaves (30) Hodge invariant. The interior 
product i is defined in [7]. The discrete version of (30) takes the form, 


p 
[mo + MOME? wo. E --pgMÓn + МО, (31) 
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мб P is derived from the interior product of the two 1-forms in the convective term, 
and contains information of $ from the previous time step and consequently has to 
be updated at each new time step. 

The simulation is initialized by first solving the Laplace equation with the 
prescribed boundary conditions. The initial velocity potential ф on the free surface, 
is set to ф(х, 1 = 0) = x, and the free surface height is set to n(x,t = 0) = 0. At 
the following time steps, the Zakharov free surface equations are solved to obtain 
new values of $ as well as the free surface elevation, n. 


3 Numerical Results 


The method is first applied to a non-temporal problem without a free surface. The 
geometry sketched in Fig. 5 contains a cylinder in the middle of a square. On the 
horizontal walls of the square and the cylinder wall the no penetration condition is 
applied. On the left vertical boundary a fully developed velocity profile is specified 
and on the right vertical boundary a constant velocity potential is defined. The 
velocity potential ф and streamlines are shown in the middle section of Fig. 5. In 
the right part of Fig. 5 the pressure field is shown. Figure 6 shows that we obtain 
spectral convergence for both unknowns. 

Furthermore, the balance equation V - u — 0 (conservation of mass) is satisfied 
both globally and point-wise independent of polynomial order as shown in Fig. 7. 

Next we apply the method to a temporal and free surface problem where we have 
included a bump on the bottom boundary. The Zakharov free surface equations are 
applied on the top horizontal boundary. In Fig.8 the pressure field and the free 
surface are plotted at = 1,100,200. 


ду 


д 
tau a= Ф=с 


дх 


д9 
Зу ^9 
Fig. 5 Left: multi-element mesh of the cylinder problem with corresponding boundary conditions. 
Middle: solved velocity potential ф in black with corresponding streamlines in red. Right: Solved 
pressure field from the Bernoulli equation 
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Fig. 6 The two unknowns of the system, the velocity potential, ф, and the pressure field, Р, are 
shown to carry spectral convergence 
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Fig. 7 The mass balance equation of (16) (V - и = 0) is satisfied both globally and locally for any 
order of the expanding polynomial 
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Fig. 8 Time progression of the pressure fields, P, at time steps t = 1, 100 and 200 are shown to 
the left. To the right the height of free surface wave 7 is shown (a scaling factor of 10 is used) 
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4 Discussion and Conclusion 


Using an isoparametric, multi-element formulation the solution of the discretized 
Laplace equation shows spectral convergence. In addition, we observe that mass is 
conserved both globally and locally. 

In (31), the discretized Bernoulli equation was kept Hodge-invariant, leaving the 
equation metric free. This suggests that the fundamental invariant of the equation 
is conserved. The Bernoulli equation conserves the total energy of the system. 
However, in Fig. 9 it is observed that a small amount of energy is gained and lost in a 
periodic manner. It is also observed that the mean energy is constant. It was possible 
to time integrate over very long time periods without noticing any degradation of 
data and we conclude that energy is conserved over long time periods even though 
fluctuations were observed for short time periods. In the future we plan on using 
a mimetic time integration scheme, which was used in [6], as well as the mimetic 
spatial discretization that was used in the present work. 


—4 n "n 
0 50 100 150 200 250 300 350 400 450 500 
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Fig. 9 Potential and kinetic energy is summed for the entire system at every time step and plotted 
against time 
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Spectral/hp Methodology Study A 
for iLES-SVV on an Ahmed Body EN 


Filipe F. Buscariolo, Spencer J. Sherwin, Gustavo R. S. Assi, 
and Julio R. Meneghini 


This work focuses on the correlation study between a computational and physical 
model of an Ahmed Body with slant angle of 25°, which generates a complex flow 
behaviour over the slant and back, with two vortices being generated from the side 
combined with separation on the slant. Physical results are from a wind tunnel test, 
performed by Strachan et al. [12] considering moving ground and Reynolds number 
of 1.7M, based on the length of the body. 

CFD simulations were performed using the code Nektar++, which is an open 
source, spectral/hp element high-order solver, which methodology combine both 
mesh refinement (h), with higher polynomial order (p) for higher fidelity modelling. 
It employs an implicit type turbulence model using a Spectral Vanish Viscosity 
GLES-SVV) model, which works as a filter for high frequencies. Same physical 
test conditions and tunnel test section were also considered, over a total time of 
4 convective lengths, with same Reynolds number of 1.7 Million from reference 
experiments. 

Considering the drag coefficient values for fully developed cases on the 5th and 
6th polynomial order, the difference observed, compared with experimental results, 
was a maximum difference of 1696, however the simulation does not consider the 
upper support used in the experimental setup. Comparing the Spectral/hp element 
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LES-SVV case from literature, the agreement with the experimental drag coefficient 
has been improved, reducing the gap from 45 to 16%. For the lift coefficient the 
maximum difference between the simulation results compared to experimental data 
is only 3%. There is also a good agreement between the ГОА measurements оп 
the end of the body with the results from the simulation. It is possible to observe 
a more intense vortex core on the simulation results, as compared to experimental 
data, which might well be explained by the upper support used to fix the Ahmed 
body in experimental test, which weakens the vortices. 

The methodology shows promising results against the open literature once an 
appropriate validation study has been undertaken. Despite the relatively coarse 
resolution adopted the results are encouraging. Having identified an appropriate 
resolution, we will next consider other slant angles, to see how well these correlate 
with the experimental studies. 


1 Introduction 


Among all automotive bluff bodies in literature, the most studied one is the Ahmed 
Воду. It was first proposed by Ahmed et al. [2], based on previous work from Morel 
[7], which was the first to study the behavior of slanted bluff bodies. The Ahmed 
body was designed to have shape similar to road vehicles and generate their main 
flow features, such as stagnation and separation points. The main dimensions of the 
Ahmed body are highlighted on Fig. 1. 

Based on the results found by Ahmed et al. [2] on the variation of the slant 
inclination angle, Huminic and Huminic [4] states that three different flow config- 
urations are found: from 0 to 12.5°, the airflow over the angled surface remains 
fully attached before separating from the model when it reaches the vertical surface 
of the back end. The flow from the angled section and the side walls produces a 
pair of counter rotating vortices, which continue downstream; from 12.5 to 30°, 
the flow over the angled section becomes highly complex. Two increased counter- 
rotating lateral vortices are shed from the sides of the angled section with increased 
size, which affects the flow over the whole back end, causing a three-dimensional 
wake. These vortices are also responsible for maintaining attached flow over angled 
surface up to an angle of 30°; from 30? and above, the flow is fully separated. There 
remains though a weak tendency of the flow to turn around the side edge of the 
model, a result of the relative separation positions of the flow over model top and 
that over the backlight side edges. 

Due to some limitations on the wind tunnel and resources, Ahmed performed 
only force measurements on the bluff body during his experiments. In order to 
better understand the flow phenomena on an Ahmed Body, Lienhart and Becker [6] 
performed a study using Laser Doppler Anemometry (LDA), Hot-Wire Anemom- 
etry (HWA) and static pressure measurements in order to investigate the flow and 
turbulence structure around the Ahmed Body model for two slant angle conditions: 
25 and 35?. The main scope was to supply a detailed data set acquired under 
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Fig. 1 Ahmed Body schematic drawing considering its main dimensions and 3D visualization 


well-defined boundary conditions, similar to Ahmed first test, which considered a 
Reynolds number of 4.29 Million based on the length and static floor, to be used as 
reference data for numerical simulations. 

Aiming to reproduce the real highway conditions of a vehicle, Strachan et al. 
[12], performed an Ahmed Body wind tunnel test with moving road conditions and 
both the aerodynamic forces and flow characteristics by time-averaged LDA were 
recorded. The flow conditions were slightly different from the ones used on Ahmed 
first test, by reducing the flow velocity to 25 m/s resulting in a Reynolds number of 
Re = 1.7 Million based on its length and the supports on the ground were replaced 
by a fixing system on the top of the tunnel, due to the rolling road simulation. 

The Ahmed Body stands as one of the most used validation cases for CFD codes 
employed for automotive applications. Simulations employing a Reynolds Averaged 
Navier-Stokes (RANS) methodology are able to predict with good accuracy the 
drag coefficient, even for cases with complex flow topology, such as the slant 
angle of 25?, with correlation factor of around 9596 compared to experimental 
results, however the flow physics does not agree, usually under-estimating the flow 
features. Attempts considering more refined methodologies such as Detached Eddy 
Simulations (DES) and Large Eddies Simulation (LES) provide better correlation 
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with experiments when comparing the flow structures but aerodynamic quantities 
values lose accuracy. 

A trend that rose to improve the confidence level of CFD simulations was the 
high-order or high-fidelity methods, such as the spectral/hp element method [5]. The 
spectral/hp elemental method combines, according to Xu et al. [13], the advantages 
of the spectral element method, in terms of the properties of accuracy and rapid 
convergence, with those of the classical h-version finite element method, that allows 
complex geometries to be effectively captured. It also provides an attractive higher- 
precision approximation to solve partial differential equations. 

One of the software that employs the spectral/hp element methodology is 
Nektar++ [9]. Nektar++ is a cross-platform spectral/hp element framework which 
aims to make high-order finite element methods accessible to the broader com- 
munity. This is achieved by providing a structured hierarchy of С++ components, 
encapsulating the complexities of these methods, which can be readily applied to a 
range of application areas, as stated by Cantwell et al. [3]. It allows the use of high 
complex solution such as implicit LES (1LES) using a Spectral Vanish Viscosity 
(SVV) technique to stabilize the solution. 

The latest achievements in the high-fidelity turbulence models around an Ahmed 
Body with slant angle of 25? are summarized in the compilation work of Serre 
et al. [11], in which a comparative analysis of recent simulations, conducted in 
the framework of a French-German collaboration on LES of Complex Flows at 
Reynolds number of 768,000. It compares the results obtained with different eddy- 
resolving modelling approaches, with two LES on body-fitted curvilinear grids: 
LES with Smagorinsky model and wall function (LES-NWM) and Wall-resolving 
LES with dynamic Smagorinsky model (LES-NWR), a stabilized spectral method 
known as iLES-SVV, similar to the one used in this present work, which is the 
base of the Nektar++ code and a DES-SST approach on an unstructured grid with 
element number ranging from 18.5 to 40 Million. Results of the flow field shows 
good agreement with results measured by Lienhart and Becker [6] by a gap on the 
drag coefficient values of 1746 for the best case and 4546 for the one using iLES- 
SVV. 


2 Objectives 


The main objective of this work is to evaluate the aerodynamic behaviour in terms 
of the drag and lift coefficients, considering an Ahmed Body with slant angle 
of 25? using a spectral/hp elements method methodology as shown on Fig.2. To 
achieve this, we first present a mesh study, evaluating two different size refinements 
referred as h-refinement for each of those, we employ three high-order surface mesh 
values to improve curvature representation. As the spectral/hp element method has 
also the possibility to improve the solution by increasing the polynomial order 
and consequently the number of degrees of freedom, we also evaluated three high 
polynomial orders for each mesh case, in a total of eighteen load cases. 
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Fig. 2 Representation of the Ahmed Body with slant angle of 25° 


АП load cases employ moving ground condition and Reynolds number of 1.7 
Million, based on the length of the body. Due to the same conditions considered, 
results are compared with experiments performed in the study of Strachan et al. 
[12]. 


3 Spectral/hp iLES-SVV 


In this work, Nektar++ is used to run an implicit LES simulation using spectral/hp 
method. In this method, the domain is first divided into non-overlapping elements, 
offering geometric flexibility and allows for local refinement. Simulations were 
performed using the incompressible Navier-Stokes solver employing a velocity 
correction scheme, combined with a Continuous Galerkin (CG) projection. More 
details are presented by Cantwell et al. [3]. 

The mathematics behind Nektar++ basically considers the numerical solution of 
partial differential equations (PDEs) of on a domain €2, which may be geometrically 
complex, for some solution u. Practically, €2 takes the form of a d-dimensional finite 
element mesh consisting of elements K;, embedded in a space of dimension dc, such 
that d < dc < 3, with Q = и; Kj is an empty set or an interface between elements of 
dimension dbar « d. The PDE problem is solved then in the weak sense, considering 
that ШК; must be smooth with at least a 1st-order derivative. Therefore is required 
that ШК; is in the Sobolev space № 2(К;) equivalent to Hı (K;), according to 
Adams [1]. For a continuous discretisation, we impose C? continuity along element 
interfaces. 

We assume the solution can be represented as из(х) = $^, ûn ,(x), a weighted 
sum of N trial functions Фи (x) defined on © and the problem becomes that of finding 
the coefficients û„. The approximation us does not directly give unique choices for 
the coefficients й„. To achieve this, a restriction is placed on the residual so that its 
L2 inner product, with respect to the test functions V, (х), is zero. For a Galerkin 
projection it is chosen that the test functions are the same as the trial functions, 
that is У, = Фи. As outlined previously, to construct the global basis Фи it is first 
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considered the contributions from each element in the domain. Each K; is mapped 
from a standard reference space K is between [—1, 1] by a parametric mapping xe: 
К becomes К; given by x = хе(&), where К is one of the supported region shapes, 
and é are d-dimensional coordinates representing positions in a reference element, 
distinguishing them from x which are d-dimensional coordinates in the Cartesian 
coordinate space. 

The next step is to construct a local polynomial basis on each reference element 
with which to represent solutions. For 3D regions, a tensorial basis may be used, 
where the polynomial space is constructed as the tensor-product of one-dimensional 
bases on segments, quadrilaterals or hexahedral regions. 

Spectral/hp element discretisation generally lead to approximations that have low 
dissipation and low dispersion per degree of freedom when compared to lower- 
order methods. As stated by Xu et al. [13], in solving advection-diffusion equations 
and nonlinear partial differential equations such as advection-dominated flows, at 
marginal resolutions, oscillations appear that may render the computation unstable. 
Artificial viscosity has been used in may discretisation methods to suppress wiggles 
associated with high wavenumbers has been broadly and effectively used in 
simulations using the Fourier method. A related concept is the so-called SVV, 
which was originally proposed based on a second-order diffusion operator for 
spectral Fourier methods. SVV has been explicitly regarded as a turbulent model 
of implementing iLES under the assumption that the action of subgrid scales on the 
resolved scales is equivalent to strictly dissipative action stated by Sagaut [10], even 
though SVV is not explicitly designed as a subgrid-scale model. An example of a 
1-D SVV kernel is: 


0, p x Pcut 
Df = —Р? (1) 
exp (- GE , p > Peut 


where P is the total number of modes employed and Peur is the cutoff polynomial 
order. SVV with the kernel function Df can be regarded as a low-pass filter. We 
see that the SVV dissipation added to the high mode numbers with respect to the 
spectral element discretisation does indeed yield dissipation at the global high wave 
number scales of the solution. 

For this work, we employed a novel CG-SVV scheme with DGKernel, proposed 
by Moura et al. [8] where he dissipation curves of CG of order p are match to those 
of DG with order p — 2, eliminating non-smooth dissipation characteristics arising 
from CG dissipation when considering high Reynolds number. 
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We first define the coordinate system as X the streamwise direction, Y the vertical 
direction and Z the spamwise direction. The Ahmed Body length of 1.044 т 15 
defined as 1 AL. The virtual wind tunnel dimensions are 2.74 x 1.66 m for the 
test section and total length of the domain of 4 AL, similar to Strachan et al. [12] 
study. The Ahmed Body model back in placed on X = 0, inlet position at X = —2 
AL and outlet position at X = 2 AL. A schematic setup is shown on Fig. 3. 

In terms of boundary conditions, velocity was normalized to 1 in order to match 
the Reynolds number previously stated and set as the inlet boundary condition. The 
outlet was set as pressure high-order outlet condition and the floor was also set with 
the same velocity of the free stream in order to reproduce the moving floor effect. 
The top and outer side wall and the Ahmed Body wall are set as no slip condition 
and a symmetry condition. Total simulated time is 7 convective lengths AL, which 
means that the flow is able to cross the whole domain. 

This study evaluates two mesh configuration considering different h-refinements 
and referred as Original and Refined meshes and for each of those, three high- 
order surface mesh settings: 4th, 5th, and 6th order, generating six different meshes. 
All mesh files were generated by NekMesh, which is Nektar++ high-order mesh 
generator. In both Original and Refined meshes, cases two refinement zones were 
generated, where the first one, defined as the Ahmed Body refinement, ranges from 
0.3 AL before the beginning of the geometry and 0.3 AL after the end of the body, 
in a total length of 1.6 AL. The second refinement, defined as the Wake Refinement 
region, intercepting the first refinement in 0.3 AL before the end of the body, to 1.3 
AL after the end of the body, in order to fully capture the flow phenomena in the 
separation region, with same total length of 1.6 AL, as illustrated on Fig. 4. 

The Original Mesh has total number of elements for half model around 95,000. 
For the Refined mesh, the boundary layer setup was the same and the dimensions 


No slip Wall Symmetry Plan Flow Direction 
Problem setup: (based on Strachan 2007) (half model) 
Re=1.7e+6 
(based on the length) 
Inlet - U=1 
(normalized) 
Outlet 
(High-Order 
Boundary 
Condition) 
No slip Wall Moving Floor — U=1 


(Ahmed body surface) (based on Strachan 2007) 


Fig. 3 Schematic representation of the boundary condition on the Ahmed Body simulation 
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Fig. 4 Plane Z = 0 representation of mesh refinement regions. Ahmed Body Refinement region 
highlighted in yellow and Wake Refinement region highlighted in black 


Fig. 5 Plane Z =0 mesh refinement comparison between two h-refinement cases. (a) Original 
mesh. (b) Refined mesh 


were kept the same in terms of sizing. Refined mesh setup, giving a total of 310,00 
elements. Details of both meshes are shown in Fig. 5. 

Most of the commercial CFD code employ low order methods and the highest 
order polynomial interpolation for the solutions usually seem is 3rd. The mesh 
plays the major role for complex simulations such as LES, leading to elevate 
number of elements to reach a reliable result. To make use of the flexibility of the 
spectral/hp element methods, we proposed solutions considering polynomials with 
order higher than 3rd within the previous mesh refinement studies as the higher order 
polynomials increase the degrees of freedom and resolution of the mesh. For the 
Nektar++ implicit LES simulations using the Incompressible Navier-Stokes solver 
evaluated three different polynomial expansions, 4th, 5th and 6th orders, referred 
here as P4, P5 and P6. In summary, 18 load cases were evaluated using HPC with 
432 CPUs for each case. 
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5.1 Drag Coefficient Comparison Results 


The drag coefficient for the 18 cases evaluated, considering 9 from the Original 
mesh with 95,000 elements considering fourth, fifth and sixth polynomial order and 
the Refined mesh case with 310,000 elements also considering fourth, fifth and sixth 
polynomial order expansion, with maximum RMS and compared with experimental 
results are shown on Fig. 6. 

From Fig. 6 it is possible to observe that for the drag coefficient, P4 polynomial 
expansion considering both mesh cases presented mean drag results around 35% 
higher than the experimental results. For the P5 cases, considering again both 
Original and Refined mesh cases, the error was reduced to 5% however results 
change the trend from over-predicted to under-predicted when the mesh is refined 
further. The cases considering P6 polynomial expansion presented the same trend 
for both mesh cases, highlighting its consistency although the mean error when 
compared to experiments increases to 1696. 


5.2 Lift Coefficient Comparison Results 


Similar to the drag coefficient graph, in Fig. 7 the lift coefficient for the all evaluated 
cases is shown, considering Original and Refined meshes and fourth, fifth and 
sixth polynomial order expansion. Maximum RMS is also plotted for all cases and 
compared with experimental results from Strachan et al. [12]. 
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Fig. 6 Drag coefficient for the 18 evaluated test cases. On the left, average values for Original 
mesh, considering fourth, fifth and sixth polynomial expansions (P4, P5 and P6). On the right 


average, values for Refined mesh, considering fourth, fifth and sixth polynomial expansions (P4, 
P5 and P6) 
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Fig.7 Liftcoefficient for the 18 evaluated test cases. On the left, average values for Original mesh, 
considering fourth, fifth and sixth polynomial expansions (P4, P5 and P6). On the right average, 
values for Refined mesh, considering fourth, fifth and sixth polynomial expansions (P4, P5 and P6) 


Analyzing Fig.7, we observe that the h-refinement from Original mesh to 
Refined mesh lead to results closer to experimental values when adopting P4 as 
the polynomial expansion basis. For both P5 and P6 polynomial expansions, lift 
coefficient results present good agreement with experimental data, with maximum 
mean error of 5%. 


5.3 Flow Structure Comparison 


In terms of the polynomial order expansions for the solution, combined with the 
6th order surface mesh, the results present focus on the Refined mesh case, once 
they improved the correlation for the P4 polynomial expansion within experimental 
results and kept similar trend for P5 and consistent results for P6 in terms of drag 
and lift coefficient prediction. An initial comparison in terms of flow structures is 
present by the Q-Criterion of 350 coloured by pressure, comparing the Refined mesh 
case, considering P4, P5 and P6 polynomial expansions in Fig. 8. 
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-———X 00 — “лл >. -—-— 


Fig. 8 Iso-Surface of Q-Criterion = 350 colored by pressure оп the Ahmed Body with slant angle 
of 25°, considering Refined mesh and 6th order surface mesh for fourth, fifth and sixth polynomial 
expansions (P4, P5 and P6) 
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Fig. 9 Contour of Lambda 2 on the plane x/L = 0 on the back of the Ahmed Body with slant angle 
of 25°, considering Refined mesh and 6th order surface mesh for fourth, fifth and sixth polynomial 
expansions (P4, P5 and P6) 


From Fig. 8 it is possible to visualize that P4 is unable to define the vortex on 
the side of the slant, explaining also the difference in terms of both drag and lift 
coefficients, compared to experimental results. Results for P5 show the side vortex 
clearly defined and P6 is also able to capture the lower vortex, detailed on the lower 
image, which is not present in the studies considering the Ahmed Body, but they 
are important to understand the behaviour with the moving floor. Figure 9 shows a 
contour of Lambda 2 to illustrate the lower vortex detail on the plane x/L = 0, on the 
back of the Ahmed Body. 

We next focus only on the Refined mesh with 6th order surface mesh for P5 and 
P6, once they were able to predict both lower and top vortices. Due to nature of 
the wind tunnel with moving ground used by Strachan et al. [12], the model had 
to be fixed on the top by a steam, which can be removed in the drag coefficient 
calculations, however it might change the flow topology over the slant, as stated by 
the authors themselves. 

Comparing the plane x/L = 0.076 with the measurements of the flow velocity 
on x direction U normalized by the free stream velocity of Lienhart and Becker [6] 
with static floor without the steam on the upper portion with results of Strachan et 
al. [12], it is possible to notice intensity changes in the U normalized velocity and 
this is attributed by the last due to the upper support. 

As the simulations do not included the upper support, but do include the moving 
ground, the expected results are the top portion to be similar to Lienhart and Becker 
[6] measurements and the lower part, correlated with measurements of Strachan 
et al. [12], which both P5 and P6, proved to have good agreement in terms of 
normalized U velocity, shown in Fig. 10. 

Similar comparison is presented on Fig. 11 forthe vortex intensity on the slant, on 
the plane x/L = 0, on the back of the Ahmed Body for vertical velocity V normalized 
by the free stream velocity. In this, the simulations close correlate to Lienhart and 
Becker [6] study, due to the absence of the steam support but for this case, the higher 
polynomial order expansion P6 is able to capture more scales than the P5 for the core 
of the main vortex, highlighting the gain of resolution of the high-order simulations. 
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Fig. 10 Contour of U velocity normalized by free stream velocity on the plane x/L = 0.076 of the 
Ahmed Body with slant angle of 25?, comparing LDA measurements of Lienhart and Becker [6] 
with static floor without the steam (left), results of Strachan et al. [12] with moving floor and steam 
support (middle left), Refined mesh and 6th order surface P5 (middle right) and Refined mesh and 
6th order surface P6 (right) 
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Fig. 11 Contour of V velocity normalized by free stream velocity on the plane x/L = 0 of the 
Ahmed Body with slant angle of 25?, comparing LDA measurements of Lienhart and Becker [6] 
with static floor without the steam (left), results of Strachan et al. [12] with moving floor and steam 
support (middle left), Refined mesh and 6th order surface P5 (middle right) and Refined mesh and 
6th order surface P6 (right) 


6 Conclusions 


Within the advances in CFD codes, confidence level and computational power, 
aerodynamic simulations are applied in almost every automotive company. The 
reason is very simple: reduced development cost and time, which is an enormous 
advantage in a competitive market. 

High-fidelity simulations are becoming a reality for complex industrial cases in 
order to improve resolution and results in a reliable response time, such as presented 
for the Ahmed Body on this work. 

On the meshing definition study, the surface mesh order seems not to influence 
the results in terms of aerodynamic quantities, presenting similar trend for same 
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polynomial order, as the Ahmed Body geometry has curved surfaces only on the 
front portion. 

Still on the mesh definition, as the h-refinement increases from Original to 
Refined mesh, the drag coefficient values for P4 and P6 remains unchanged and P5 
values switched from positive to negative. We conclude that consistency is shown 
for P4 and P6 cases but P6 presented the most reliable results, with a maximum 
deviation of 16%. For the lift coefficient, results for P4 improved as the h-refinement 
increased and kept similar values for both P5 and P6, where the best agreement was 
found for the case considering Refined mesh with 6th order surface mesh and P6 as 
the polynomial expansion. 

Flow structure results focus only the Refined 6th order surface mesh, where the 
main expected features were captured by P5 and P6 cases. It was confirmed by those 
two simulation cases that the lower portion has similar behaviour of the moving 
ground test conducted by Strachan et al. [12], however the top portion close correlate 
to Lienhart and Becker [6] experiments, as the simulation cases allow the body to 
be fixed without the upper support used in the experiment. This fact might also 
explain the difference from the simulation results with the literature experiments, as 
the simulation allows idealized configurations. 

For all simulation cases, half of the body is being simulated and a symmetry 
plane is set on the middle portion. From Fig. 12, which shows the normalized U 
velocity on a line of coordinate y/L=0.15 on the plane x/L = 0, we observe that 


Ahmed 25°, x/L=0, y/L=0.15 


v/Vex [-] 


—9— Lienhart 
—@— Strachan 


z/L [-] 


Fig. 12 Normalized U velocity distribution over a line at coordinate y/L = 0.15 on the plane x/L 
= 0 at the back of the Ahmed Body with slant angle of 25°, comparing LDA measurements of P5 
(red), P6 (orange), Lienhart and Becker [6] (dark green) and Strachan et al. [12] (light green) 
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simulation has good agreement with experimental results, with a small distortion as 
it gets closer to the symmetry plane. 
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A High-Order Discontinuous Galerkin (8) 
Solver for Multiphase Flows SS 


Juan Manzanero, Carlos Redondo, Gonzalo Rubio, Esteban Ferrer, 
Eusebio Valero, Susana Gómez-Alvarez, and Ángel Rivero-Jiménez 


1 Introduction 


Multiphase flow is not a canonical problem, therefore different models can be found 
in the literature. Volume Of Fluid (VOF) model [9] is amongst the simplest. It 
defines a single set of momentum equations shared by all phases, whilst the volume 
fraction (fraction of a particular infinitesimal control volume which is occupied by 
each phase) is tracked throughout the domain following an advection equation. 
Phase-field methods [11] conserve the simplicity of VOF whilst increasing the 
physical meaning of the evolution equation of the fluids present in the simulation. 
The volume fraction is substituted by a phase-field parameter, which identifies each 
phase. In this work, the Cahn-Hilliard equation [4] is chosen to model the evolution 
of the phase-field parameter. 

The introduced model is discretised in space using a high-order discontinuous 
Galerkin method. These methods have been gaining popularity for the discretisation 
of conservation laws, such as the Navier-Stokes equations [5—7, 13, 16, 22, 26]. 
Specifically, we use a Discontinuous Galerkin Spectral Element Method (DGSEM) 
[2] that allows the generation of provably stable schemes [8]. These schemes provide 
enhanced robustness when compared to classical high-order methods [17—20]. 
As far as the temporal discretisation is concerned, we use an efficient implicit- 
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explicit approach that permits maintaining the time step restriction of a typical 
one phase Navier-Stokes solver. It should be noticed that similar approaches to 
model multiphase flows have been proposed in the past, see for example [29], 
where an algorithm to model N immiscible incompressible fluids with high-order 
methods is described. However, according to the authors knowledge, this 15 the first 
implementation using the DGSEM. 

The rest of the paper is organised as follows: in Sect. 2 the governing equations 
of the model are described. In Sect. 3 the numerical techniques to discretise the 
described model are introduced. Finally, in Sect. 4 the results of two validation test 
cases are shown. 


2 Governing Equations 


In this work we model multiphase flows with a phase field approach. The flow field 
is modelled by means of the incompressible Navier-Stokes equations. The evolution 
of each of the fluids is modelled with the Cahn-Hilliard equation, which defines a 
phase field variable, ф € [—1, 1], that identifies spatial coordinates occupied by 
fluid 1, ф = —1, fluid 2, ф = 1, or an interface ф € (—1, 1). The value of the 
thermodynamic properties of the fluids at each spatial coordinate can be computed 
as: 


1— 1 1— 1 
б =н (5$) » (3). o oo (=”)+т( 3). (1) 


where р; is the density of fluid i whilst 5; is the dynamic viscosity of fluid i. The 
complete system is built considering first the momentum equation, 


д (pv) 
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with velocity v, static pressure p, Reynolds number Re = (where ио is a 


reference velocity whilst L is a reference length), Capillary number Ca = mm 
(where o represents the surface tension), Froude number Fr = 787. (where g 


is the gravity acceleration) and e, is the gravity direction. Second, an artificial 
compressibility method [25] is used to couple the divergence-free condition, 
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where ро = max (o1. p2) is a reference density, and Мо is the artificial compress- 
ibility Mach number. Third, the Cahn-Hilliard equation for the phase field, 


20 у. (gy) = МУ?и, u — -$ó- p - =, (4) 


with M the mobility, and = the interface width, the two free parameters of the 
model. In (2) and (4), м represents the chemical potential. Moreover, this equation 
is designed to minimize the free-energy functional [4], F, 


1 1 
лө че) = | (30-8 (148) + етө?) ах. (5) 


Note that the set of Eqs. (2)-(4) is written in non-dimensional form, where the 
thermodynamic variables of fluid 1 are taken as reference values, e.g., 


(1-9 оз [1+ ф _ (1-Ф m (1+Ф 
se = (55) (2). пө (555) +2 (5). (6) 


The set (2)-(4) can be written as an advection-diffusion system: 


ди 

qz CN EGO = V- Fou, в) + 5(и, р), (7) 
where и = (ф, ру, р) is the state vector, g = (2g, 8v, gu) = (Уф, Vv, Уи) is the 
gradients vector, F(u) and Е, (и, g) are the inviscid and viscous fluxes respectively, 
and S(u, g) is a source term, 
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3 Numerical Methods 


The numerical implementation of (2)-(4) is performed using a high-order discon- 
tinuous Galerkin scheme for the spatial discretisation (DGSEM variant) and an 
implicit-explicit Euler scheme for the time discretisation. 
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3.1 Spatial Discretisation Using a Nodal Discontinuous 
Galerkin Scheme (DGSEM) 


Discontinuous Galerkin (DG) schemes (see [15]) are constructed by tessellating 
the domain in non-overlapping elements, where the solution is approximated using 
polynomials of an arbitrary order, N. In this particular implementation, we use a 
nodal variant of the DG method, and we restrict ourselves to hexahedral elements. 

In each element we approximate the solution using polynomials written in a set of 
local spatial coordinates & = (£, п, С) € [—1, 1]?, which are related to the physical 
space by a transfinite mapping, 


x= (x,y,z) = Х(#) = Х(#, 1,4). (9) 


Using the local coordinates, we write the solution using tensor product Lagrange 
polynomials, 


N 
и(х) |; & U(&£) = Yo UF (yii Gyr; opio. (10) 


i, j,k=0 


where the time-dependent coefficients U ijk (t) are the nodal values of the solution 
U, and /;(&) are the Lagrange polynomials based on a set of Gauss points t£; o 
To handle curvilinear geometries, we use a mapping X that transforms local and 
physical spaces. With this mapping, we can construct covariant a; and contravariant 
a’ basis, and their associated Jacobian J, and metrics matrix М: 


Ox) ; 1 
ВЕ, а = УЕ = тај хак, J= ai (a; ^ар), М = [Ja®, Ја", Лаб]. 
(11) 
Following [14], we transform the system of Eqs. (7) to local coordinates, 
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with gradients, 


Јву = Муру, Jey = MVed, Jg, = МУьи, (13) 
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and the chemical potential definition, 


Ли =—Jb+ 143 - £V, (М) | (14) 


We obtain the DG scheme replacing the continuous solution by their polynomial 
counterpart (10), then multiplying (12), written in compact form (7), by a polyno- 
mial test function (with same order N as the solution) 9, and we integrate the result 
in one element E — [—1, 1р, 


[+] өч: Fan = f ove FUG) + | J9s(U.G). (15) 
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Next, we integrate by parts the terms containing divergences, which yields 
surface integrals. Since the solution is discontinuous at the inter-element faces, we 
replace the surface flux by a numerical flux, F*, 
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where дЕ represents the six surfaces of the element E. For the inviscid numerical 
flux F*, we use the exact Riemann solver derived in [1], whilst for the viscous 
numerical flux we use the Symmetric Interior Penalty (SIP) method [27], with the 
penalty parameter value derived in [24] and recently discussed for the DGSEM 
in [21]. In (16), и is the surface outward normal vector in local coordinates. To 
obtain the evolution equations for each nodal degree of freedom U‘/*, we let 
Ò = 1;(&)1;0)1(6), and compute the integrals using the Gauss quadrature points 
(and weights {ш; }) associated to the interpolation points (which provide an accuracy 
of 2N + 1), 

uk Ut ig 
dt 


Fi 5 
+— (6, ШО + — (вр, п, Gn 
Wi é=-1 Vj n=—1 


F? [= 
+ (6,1), oue) 
Wk t=- 


N 
5 > (onir + w Pm QE + трт an) = 
m 


1 


m=0 
к a E 
(iN — 8:0) + (бум — 230) + бт — бео) 
vj 
= p» (nat + vw; Pmj Еу 4 — + ЛИ gk. 
Uk 
т=0 


(17) 


318 J. Manzanero et al. 


where FÌ = F(U/*) and Fuk = hire), being G//* the nodal values 
of the gradient С. The symbol б; represents the Kronecker delta. The derivation 
matrix D;; is defined as Dj; = Г, (&:). То compute the gradient С, we perform the 
weak formulation of (13), 


[use| uM! .та5- | Uv, (мт). (18) 
E дЕ Е 


where т is an arbitrary vector test function (from the order N polynomials space). 
Since we use the SIP method, we use solution averages to couple inter-element 
fluxes, U* = {{U}}. All the integrals involved in (18) are computed discretely similar 
to those in (16), 1.е., 
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The gradient nodal values Фа are introduced in the viscous fluxes F,(U ijk GÏ k) 
of (17) hence completing the discretisation of (16). Note that one needs to compute 
go before computing u and its gradient g,,. 


3.2 Time Integration Using IMplicit-E Xplicit (IMEX) 
and Runge-Kutta Schemes 


The time integration of (17) is performed with a combination of forward and 
backwards Euler and explicit Runge-Kutta schemes. On the one hand, the Navier- 
Stokes equations are integrated by means of a third order explicit Runge-Kutta 
(RK3) scheme [28]. On the other hand, the Cahn-Hilliard equation is integrated 
with a combination of explicit RK3 for the phase field advection, forward Euler for 
the chemical free-energy, and backwards Euler for the interfacial energy, 


n+l oon 
ф 7 ф +v. (уд)? — y? Є + (¢")° = evegit!) | (20) 


The reason behind this choice, is that the numerical stiffness of the bi-Laplacian 
(Уф) operator prevents from using an explicit method, as restricts the time-step Af 
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to unpractical values. We only treat implicitly the interfacial energy since it yields 
а constant Jacobian matrix, represented by /\. In particular, the linear system to 
solve is, 


V2 1 n+l __ ф" ККЗ 2 n n43 
[y+ Slo = Fv. (way o) a» 
The Jacobian matrix is computed numerically (see [3]) and a LU factorisation is 
performed only at the first time step. In each following iteration, the RHS of (21) 
is computed and the linear system is solved by means of forward and backward 
substitutions. Both the LU factorisation and the forward and backward substitutions 
are performed with the library MKL- PARDISO [23]. 


4 Validation 


The proposed methodology is tested with two test cases. First, the validity of 
the discontinuous Galerkin discretisation of the Cahn-Hilliard equation is tested 
with a benchmark spinodal decomposition problem [12]. Second, the validity of 
the coupled Cahn-Hilliard/Navier-Stokes system is tested with a two dimensional 
rising bubble test [10]. 


4.1 Spinodal Decomposition 


This test problem considers an initial mixture of two fluids. These fluids are 
immiscible, therefore they tend to separate to minimise their free energy (5). As 
stated before, the geometry, initial condition and fluid parameters are taken from 
[12]. In particular, the initial condition for this benchmark problem is: 


ф(х, y) = —0.05 [eos (0.105x) cos (0.11y) + [cos (0.13x) cos (0.087y)]° 


+ cos (0.025x — 0.15y) cos (0.07х — 0.02y)] . 
(22) 


The physical domain is a “T” shape with a total height of 120 units, a total width 
of 100 units, and horizontal and vertical section widths of 20 units (Fig. 1). No-flux 
boundary conditions are applied at the boundaries. Following [12] mobility is set 
to М = 10, whilst the interface width is set to = = 3.16. The physical domain is 
discretised with an unstructured mesh of 326 elements and a polynomial order of 
N =4. For the time discretization, we use a time step At = 10-3. 

Figure 1 shows qualitatively how the different phases separate, whilst Fig. 2 
shows quantitatively the evolution of the total free energy with time. In Fig.2 
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(0,0) | 
Fig. 1 "T" domain for the spinodal decomposition. Initial condition (left figure) and evolution 
with time (the right figure is the steady-state solution) 
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Fig. 2 Evolution of total free energy (5) with time 


the results of this work are compared with those obtained in [12], validating the 
proposed method. 


4.2 Rising Bubble 


This test case considers a bubble of light fluid submerged in a heavy fluid, both 
subjected to a gravitational field. Following [10] the initial configuration, see Fig. 3, 
consists of a bubble of radius г = 0.25 centred at [0.5, 0.5] in a [1 x 2] domain. A 
no-slip boundary condition is used at the top and the bottom of the domain whilst 
a free slip condition is enforced at the vertical walls. Following [10], the Reynolds 
number is set to Re = 35 whilst o and e are set to 24.5 and 0.03125 respectively 
(this gives a Eötvös number Eo = 10) whilst both density and viscosity ratios are set 
to 01/02» = 1/2 = 10. The gravitational acceleration is g = 0.98. The problem 
is discretised with 16 x 32 elements with a polynomial order of N = 4, and a time 
step At = 4. 10-6. 
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Fig. 3 Initial condition of the rising bubble test problem 
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Fig. 4 Evolution of the center of mass of the bubble with time 


This test case is quantitively compared with the results of [10] in Fig.4 with 
satisfactory results. It should be mentioned that the benchmark results of [10] are 
obtained with a sharp-interface model which may explain the small disagreement in 
the evolution of the center of mass shown in Fig. 4. 


5 Conclusions 


A method to model incompressible two phases flows is introduced. The model 
solves the incompressible Navier-Stokes equations coupled with the Cahn-Hilliard 
equation to track the evolution of the different fluids. The model is discretised in 
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space using a discontinuous Galerkin spectral element method (DGSEM) whilst an 
efficient implicit-explicit approach is used to advance in time. The validity of the 
model is shown with two test cases. A spinodal decomposition benchmark problem 
is solved to validate the Cahn—Hilliard solver whilst a rising-bubble test problem is 
solved to validate the coupled Cahn-Hilliard-Navier-Stokes system. Both test cases 
are solved showing good agreement with the literature, and proving the accuracy and 
robustness of the proposed method. 
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1 Introduction 


Due to the rapid expansion of the commercial aviation industry, authorities have 
been tightening the legislation for aircraft noise. For instance, the European 
Commission has set a 65% reduction goal of overall aircraft noise from the year 
2000 to 2050 [1]. The noise generated by the jet exhaust is one of the main 
contributors to the overall aircraft noise, especially during take-off [2]. Moreover, in 
new generation ultra-high by-pass ratio turbofan engines the increased interaction 
between the engine jet and the high-lift devices can potentially affect the noise field 
[3]. Thus, our overall aim is to develop and investigate an accurate and efficient 
method for the prediction of far-field jet noise in installed jet configurations. 

Rapid growth in computing power during the last decades has enabled the use 
of scale resolving numerical simulations for jet noise research at a reduced cost 
than most experimental campaigns. Conventionally, 2nd-order numerical schemes 
combined with surface integral techniques, particularly the Ffowcs Williams- 
Hawkings (FW-H) method [4] have been widely adopted for predicting the far-field 
noise, due to its simplicity and low cost. However, defining the envelope surface 
used in the FW-H method is not always trivial in complex configurations [5], for 
example, installed jets on aircraft wings. Also, the results may be overly sensitive 
to the size, shape and location of these surfaces. Now, directly resolving the 
Navier-Stokes (NS) equations for sufficiently accurate far-field jet noise results is 
prohibitively expensive [6]. LES using finite volume 2nd-order accurate schemes 
has proven to be reliable and robust for solving jets’ near field, but large numerical 
dispersion and dissipation error makes them less suitable for the propagation of the 
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sound waves to the far field. High-order methods provide more accurate propagation 
due to their reduced numerical error but are insufficiently robust for simulating 
complex jet flows. Therefore, we have used a coupled approach in which a finite 
volume LES solver is used to obtain the acoustic sources, which are then transferred 
to a high-order acoustic solver that propagates noise to the far-field. 

The spectral/hp DG method [7] is capable of providing high-order accuracy 
and handling mixed mesh elements types such as tetrahedra and hexahedra, thus 
providing a potential solution to geometrically complex acoustic problems. The 
solver based on this approach is AcousticSolver of the Nektar+-+ framework [8, 9]. 
The LES code HYDRA and acoustic code AcousticSolver have been coupled and 
validated using hexahedral elements [10, 11]. A similar coupling strategy has been 
used previously for jet noise [12] and combustion noise on tetrahedral grids [9]. 

In this paper, our focus is on two aspects: (1) estimates of mesh design for the 
high-order solver using a canonical two-dimensional (2D) case and (2) comparison 
of three-dimensional (3D) turbulent isolated jet-noise results on a tetrahedral 
grid and a comparable hexahedral grid using the coupling approach. From the 
perspective of our near future work, the tetrahedral grid results provide motivation 
and parameters for the set-up of the coupled methodology for jet-flap interactions. 


2 Numerical Methods and Solvers 


In this section, the details of the high-order spectral/hp DG solver employed to solve 
the APE equations are provided followed by a brief description of the LES code 
that solves the filtered compressible NS equations. Finally, the coupling of the two 
15 briefly mentioned. 


2.1 APE Solver 


Equations for Propagation The acoustic perturbation equations (APE) solved here 
are the ones proposed by Ewert and Schróder [6] in the APE-4 form. These 
equations describe the transport of acoustic fluctuations in a linearized form, where 
the source terms can be non-linear, and can be written as: 


/ 
д.р + v. (» E = @ас, (1) 
С 


ди’ + V (u у u’) +V (=) = Qn. (2) 
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where р’, и are the acoustic pressure and acoustic velocity vector respectively and 
c is the speed of sound. The time-averaged quantities are denoted by the over-bar 
and acoustic fluctuations are primed. The left-hand side of (1) and (2) represents the 
advection of waves in the mean flow. The right-hand side describes different sources 
that may be present in a generic aeroacoustic problem. 

Finally, the source terms, д and qm are defined as: 


Ds' 
У. HON D. : 3 
ас (o'w) 2, (3) 
/ 
= у (w)? У.т\’ 
Чи = — (@ x uy + T'Vs УТ — ve + (т) | (4) 


These terms are classified into four categories: 


1. the non-linear terms: —V - (ouw) and — (У (w)? /2у, 

2. the heat/entropy terms: (p/cp) - (Ds’/Dt) and T'Vs — УТ, 
3. the viscous term: (V - т/р)! and 

4. the vortical term, known as the Lamb vector, L' = —(® x uy’. 


In this paper, only the Lamb vector L’ is considered as a source term because it is 
the dominant contributor for isothermal applications with strong vortical motions 
(shear layers and wakes), as demonstrated in [12, 13]. 


Numerical Solver The solver used for the above APE equations is called Acous- 
ticSolver, which is part of the open-source Nektar+-+ framework [8]. The solver 
employs a high-order, spectral/hp element method with a DG formulation [7]. In 
short (for details see [9]), the present DG method works as follows: 


1. The computational domain is divided into non-overlapping elements. 

2. The governing equations are discretised in each element by a weighted sum of 
basis functions where the coefficients of the expansion are the unknowns. In case 
of tetrahedral elements, the basis functions are modified hierarchical Jacobi basis 
[8]. 

3. The discretised equation is then multiplied by a test function (same as the 
basis function) followed by integration over each element in order to obtain the 
variational form of the governing equations. 

4. The flux terms in the variational equation are responsible for communicating the 
information across the elements. The interface fluxes are calculated using the 
immediate left- and right-side values with a Riemann solver. 


The scheme used here to solve the Riemann problem is a local Lax-Friederichs 
scheme as defined in [9]. The temporal discretisation is performed using a 4-ог4ег 
Runge-Kutta scheme. A numerical sponge layer [14] is set up using source terms 
to dampen out the outgoing acoustic waves smoothly, thus minimising reflections 
from the boundaries of the domain. 
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2.2 LES Solver 


The LES is performed using the in-house code of Rolls-Royce plc., HYDRA [15] 
that solves Favre-filtered unsteady compressible Navier-Stokes equations [10]. It 
is a density-based, spatially 2nd-order accurate finite volume cell-vertex code used 
for propulsion and turbomachinery applications. More details on the set-up of the 
spatial scheme used can be found in [10]. For the temporal discretisation, a 2nd- 
order, four-stage Runge-Kutta explicit algorithm is employed. The size of the time 
step is chosen to keep the Courant number less than unity. The code is capable of 
solving arbitrary mesh topologies which is beneficial for complex geometries. The 
sub-grid scale model is chosen as o-model [16] with model constant Со = 1.35 
[17]. 


2.3 Coupling of Solvers 


The 3D data from LES mesh is transferred and interpolated onto the APE mesh in 
real time. The interpolation is necessary because two solvers have different meshes 
designed specifically to capture flow and acoustics. The transfer-interpolation 
process takes place in parallel. This is achieved using an MPI based coupling 
strategy with the open-source library CWIPI [18]. More details on the coupling 
mechanism are provided in Lackhove et al. [9] and Moratilla-Vega et al. [10]. Note 
that larger time steps can be used for AcousticSolver since it is not restricted to 
resolve the small flow structures. 


3 Test Cases 


Two cases are presented here. First, a canonical noise propagation case due to a 
well-defined vortex-pair source run on AcousticSolver alone. А study of numerical 
error by changing the mesh and polynomial expansion order (P) is performed. The 
second case uses the mesh parameters from the first to propagate noise generated by 
an isolated jet in an LES simulation. This case provides validation of the coupling 
for a 3D turbulent jet noise case on a fully tetrahedral mesh. The results are then 
compared to the ones obtained with the FW-H technique. 


3.1 Spinning Vortex Pair 


The case is an acoustic wave propagation problem in two dimensions where the 
source is mathematically well-defined, as in the original work on APE [6]. The 
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case is run with standalone AcousticSolver. The source is in the form of two- 
point vortices at a distance of ro from the origin, rotating with a circulation Г. An 
analytical solution of the induced acoustic field was found by Müller and Obermeier 
[19] as: 


4 

~ Pol (2) 

p= TEE NE (kr), (5) 
оо 


where, HY? is the Hankel function of 2nd-order and second kind, the rotation 
period is defined as T = 8лт?г$ /Г; the angular velocity as o = T/Azrg and 
the Mach number as M, = Г/4лғосоо. The real part of Eq. (5) gives the pressure 
fluctuations. Ewert and Schróder [6] found the source-term based on the Lamb 
vector that represents the acoustic field for this case as: 


ту 2 
gus Ге, (t) > D ex ан 1)'ro(5) | 


8220219 202 | 


where, г — (x, 3). ro = roger, €, = (cos, sin)? and 0 = at. 

The computational domain considered is circular and extends to 250rp. The 
source parameters are set as in [6] i.e. l'/(cooro) = 1.6 and M, = 0.1273. 
Simulations are run until the pressure fluctuations reach г = 200гр in order to 
minimise the boundary effects. All the elemental meshes consist of triangles and a 
modified hierarchical Jacobi polynomial basis [8]. 

First, a reference simulation with polynomial order 4 (P4) is run on a fine 
uniform mesh. The resulting acoustic pressure field is plotted along a diagonal 
line in Fig. 1а. The result from this simulation matches its analytical counterpart 
well. Further comparisons are made with respect to this well-resolved P4 numerical 


x107? 
24 —— Analytical 
€ P4 (reference) 
14 Propagation: 
а 
ъ 0 
—1 4 
24 T T T T 
40 60 80 100 120 14 
т/то 
(а) (b) Source: -4000 (blue) to 4000 (red). 


Fig. 1 (a) Pressure field along х = y line, (b) solution points, and contours of the source term in 
the source region and the pressure field in the propagation region 
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Table 1 Details of the test cases run 


Simulation | Poly. order | N; x № | ppw at A = 150 | CPU cost 
P4 (reference) |4 |260 x 152 | 90 | 6.53 

P1 1x coarse E | 64x76 | 5.20 1.00 

P22x coarse |2 [32x77 [515 [0.82 
P4 4x coarse |4 15 x 76 | 5.00 0.68 


М, is the elements in the radial direction 


-3 1 
x10 13 x10 
24 e P4 (reference) *- P2 2x coarse є — Р1 1х coarse М.А. 
— P1 1х coarse --- P4 4х coarse = 1.04 +++» P22x coarse М.А. 
= —- P4 4x coarse М.А. 
< 0.8- 
& 
0.6 4 
2 941 | 
i 5 09 12% limit Nd 
m Sp tg I < 
T T T T T T 0.0 “+ T т т т т 
60 80 100 120 140 160 180 25 50 75 100 125 150 
r/ro r/ro 
(а) (b) 


Fig. 2 Acoustic pressure and relative error comparison of the test cases. (a) Pressure along x=y 
line. (b) Relative error (as moving average) 


result, henceforth called as “P4 reference". For the test cases, elemental meshes are 
coarsened radially and the polynomial order is elevated in the propagation region, 
such that, the solution points-per-wavelength (referred as “ppw’’) distribution is 
similar in the radial direction. The radial growth rate is kept ~ 1.023 with geometric 
distribution in all the cases. In the source region, the mesh is kept the same with P1 
expansion for all cases. This allows having a smooth transition of solution points 
distribution when crossing from one region to the other. A sample P2 mesh and 
contours are shown in Fig. 1b. Table 1 summarises the test runs. 

Figure 2a compares the pressure fields in different test cases with the P4 
reference. As expected, the P1 simulation shows a considerable reduction in the 
amplitude. P2 and P4 preserve this quantity more accurately. For simplification, 
we unify the dissipation and dispersion error by calculating the overall relative 
error as a moving average (M.A.) over bins of —30ro. This is plotted in Fig. 2b. 
For P1 and P2 simulations, а 2% error limit is reached around 50ro (ppw ~ 9) and 
85го respectively (ppw ~ 6.4). P4 simulations remain below this limit under the 
present conditions (note ppw ~ 5 at 150ro). The values of ppw for different P agree 
with those suggested in [20] and provide an estimate for mesh design in different 
polynomial order setting. 

Note that for the given ppw, mesh expansion rate and Riemann solver, we did 
not observe reflections of the acoustic signals on the inter-element boundaries. A 
caveat of the present study is that we calculate total numerical error here for brevity, 


High-Order Propagation of Jet Noise on a Tetrahedral Mesh Using LES Sources 331 


however dissipation and dispersion error could be studied separately as done in [20] 
on a one-dimensional advection study. 


3.2 3D Turbulent Isolated Jet Noise 


As a step forward towards noise prediction of installed jets, an isolated jet is 
simulated using a tetrahedral mesh for AcousticSolver to verify the capabilities of 
the present methodology for complex 3D cases. Note that the coupling is already 
validated on a cylinder in cross flow and a cylinder-airfoil interaction case in [11]. 


Jet Flow The LES performed is described here briefly since the same is detailed 
in [10]. An isothermal turbulent jet issuing from a circular cross-section nozzle at 
Mach 0.9 and Reynolds number Re — 10,000 (based on jet bulk velocity U; and 
jet diameter D;) is considered. Following Shur et al. [21], the present LES domain 
is cylindrical in shape and extends as x/D; = [-5, 100] and r/D; = [0, 50]. 
The mesh has 190 x 75 x 49 nodes in the axial, radial and azimuthal directions 
respectively. It is refined in the shear layer development area and coarsened towards 
the outer boundaries. Figure 3a shows a central cross-section of the LES mesh. The 
inlet boundary condition is a total pressure profile. 


Jet Acoustics The acoustics domain is cubical to facilitate control on mesh growth. 
It extends as [—5, 40] D; in streamwise direction and [—25, 25]Dj in transverse 
directions. Noise propagation on two different grids is compared: fully hexahedral 
(“hexa”) and fully tetrahedral (“tetra”). The former mesh consists of 107 x 69 x 69 
elements in the streamwise and transverse directions respectively [10]. The tetra 
grid is generated to give a similar distribution as the hexa mesh in the vicinity of the 


Hexa. mesh 


MH 


Fig. 3 Cross-section view through the centre of the jet nozzle. (a) LES mesh elements. (b) 
AcousticSolver meshes 
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jet. providing 300,000 elements in total. Figure 3b shows the two meshes where it 
15 seen that the nominal element size in the tetra mesh is slightly larger away from 
the jet nozzle. Results on the hexa grid (P4) are available from [10] and calculations 
are performed on the tetra grid in this study. The expansion type utilised is a P4 
modified Jacobi basis [8]. A numerical sponge layer [14] of thickness 3D; is applied 
at the outer boundaries to avoid reflections of the outgoing waves. А factor of 3 in 
time step size is used as compared to the compressible LES. In line with Sect. 3.1 
and [20], a value of ppw ~ 5 is chosen for accurately resolving frequencies up to a 
Strouhal number St = 0.9. 


It is already demonstrated in [10] that the LES flow quantities are in acceptable 
agreement with the high-order LES study of Shur et al. [21]. The noise propagation 
is calculated using the FW-H method [4] in addition to the present coupled approach. 
The nominal cut-off St for the integral surface defined is ~0.3 based on the 22 ppw 
criterion [6]. Figure 4 shows a visual comparison between the acoustic pressure 
field computed by LES alone and coupled LES-APE (on two meshes). Figure 4a, 
b qualitatively show that the coupled LES-APE has retained more acoustic content 
(especially at higher frequencies) due to lower numerical error. This difference is 
more pronounced in the direction perpendicular to the jet centre-line. Qualitatively 
comparable results are obtained on the tetra mesh as depicted in Fig. 4c. 

Figure 5 shows a quantitative comparison in terms of power-spectral-density 
(PSD) at two observer locations at a distance of 120D;. The PSD for FW-H is 
calculated over the surface indicated by the dashed line in Fig. 4a (details in [10]). 
Comparison is done with the LES of Shur et al. [21] and the experiment of Tanna 
[22] (Re = 10°, Mach = 0.9). As previously observed in Fig. 4, the difference 
between FW-H and the present coupled approach is significant at higher frequencies. 
For the 30? location, the tetra mesh results match the hexa results well. At 90? 


Fig. 4 Acoustic pressure field at the same time instant (in grayscale [—30, 30] Pa). (a) LES. (b) 
LES-APE (hexa). (c) LES-APE (tetra) 
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Fig. 5 PSD at 120D; at two observer locations with respect to the jet centre-line. (а) 30°. (b) 90? 


location, there is an improvement of cut-off 51 from 0.3 to 0.8. A small discrepancy 
is seen at 90° for St > 0.8 (close to the cut-off 51 = 0.9). This may be improved by 
using a finer mesh in the far-field. Overall, the APE results are an improvement over 
the present FW-H prediction in the high frequency domain. Moreover, the results 
from the tetra mesh are comparable to ones from the hexa mesh. This implies that 
the present methodology using tetra grids can be extended to more complex cases 
(such as installed jets). 


4 Conclusions 


A spectral/hp code AcousticSolver (under Nektar++ framework) has been 
employed for acoustic waves propagation. The favourable properties of this solver 
are high-order accuracy and capability to handle unstructured mesh elements. A 
study on a canonical test case with an analytical solution provided estimates for 
designing the mesh for the jet application. For polynomial order expansion P4, 5 
solution points-per-wavelength is found to provide a low overall error. This value 
is close to the one reported in a related study [20]. These estimates are used to 
design a tetrahedral mesh for prediction of noise from an isolated jet (Re = 10%, 
Mach = 0.9). The noise sources are calculated from a 2nd-order accurate finite 
volume LES solver and interpolated onto AcousticSolver mesh on-the-fly for 
noise propagation. The noise results thus obtained offer an improvement over the 
traditional FW-H method due to high-order accuracy. The power-spectral-density 
(PSD) results of the noise signal at two different locations relative to the jet nozzle 
show that the PSDs obtained on the tetrahedral mesh agree with the ones obtained 
on a slightly finer hexahedral mesh. Further improvements may be achieved by 
refining the former mesh in the radial direction. These results are encouraging for 
noise-prediction of more complex industrially relevant geometries such as installed 
jets. 
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Dynamical Degree Adaptivity A 
for DG-LES Models ш 


M. Tugnoli, А. Abbà, апа L. Bonaventura 


1 Introduction 


Discontinuous Galerkin spatial discretizations of compressible flows allow to 
perform local degree adaptation (shortly, p-adaptation) in a very straightforward 
way and almost without computational overhead, as shown e.g. in [6]. Dynamical 
adaptation was also applied successfully to inviscid geophysical flows in [11, 12]. 
All the previous works relied however on a refinement criterion which essentially 
estimates the L? norm approximation error. In [10], we have argued that such a 
criterion may not be optimal for LES and we have proposed a different, physically 
based criterion that was shown to be more effective in a number of numerical 
experiments. The goal of this work, which summarizes some of the results presented 
in [9], is to extend the above approach to dynamical adaptation and to test the new 
criterion also in a dynamically adaptive framework. 


2 TheDG-LES Approach and Its Numerical Implementation 


The DG-LES model for compressible flows employed in this work, based on 
a Local Discontinuous Galerkin (LDG) discretization of the viscous terms [3], 
is fully described in [1], to which we refer for all the details on the model 
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equations and numerical discretization approach. Here, only a short description 
of the discretization elements necessary to introduce dynamical adaptivity will 
be reported. On the computational domain © C IR? a tessellation 77, is defined, 
composed of non overlapping simplicial elements. A discontinuous finite element 
space Vp is defined as 


Va = [m € 2200) : valg € (К), УК ЕТ, |, (1) 


where IP4?* (K) denotes the space of polynomial functions of total degree ак. The 
degree can vary arbitrarily from element to element, and the definition of a suitable 
way to assign such polynomial degree will be discussed in the following. The 
numerical approximation of the generic variable a can be expressed as 


ng(K) 


ак = У ак, (2) 


1=0 


where ФК are the basis functions on element К, a”) are the modal coefficients of 
the basis functions and ng (K) + 1 is the number of basis functions required to span 
the polynomial space P4x (K) of degree дк, defined in IR? as: 


1 
ng(K) = gx + Dax + 2)(ак +3) – 1 (3) 


It is worth noting that the expression in (2) can be rewritten, thanks to the 
hierarchical nature of the basis, as 


ак 
ak = У У афу, (4) 


p-0led, 


where do = {0} and dp = [ € 1...ng(K) | ФЕ P?(K)\PP-'(K)| is the set 
of indices of the basis functions of degree p. Obtaining a more or less accurate 
approximation can be done through increasing or decreasing the limit gx of the 
sum over p. It is also worth noticing that the basis normalization implies that the 
first coefficient of the polynomial expansion a) coincides with the mean value of 
аһ|к over К. 

In the present DG-LES approach, as discussed extensively in [1], the LES 
filtering operators are built directly into the DG discretization, in a spirit similar 
to the VMS approach [4]. Considering П.у : L?(Q) — V the L? projector over the 
subspace V C L?(Q), defined by 


f пушьк= f vas. Vu,v € V. 
о Q 
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it is possible to define the LES filtering * as the projection over the finite dimensional 
solution subspace Ур in the following way: 


а = Пл, а. (5) 


The application of the main LES filtering is purely formal, since it coincides with 
the discretization of the equations. In this way, simply discretizing the equations 
leads to solving them for the filtered quantities. 

Another parameter to be defined is the filter characteristic dimension, ^, 
employed in the definition of all the eddy-viscosity based subgrid model. The 
definition of the filter size is constant over each element, since the projection is 
performed elementwise. While more refined definitions can be employed, see e.g. 
[2], the simple definition 


A(K) = J — = (6) 


was employed with success. For the time discretization, the five stages, fourth order 
Strong Stability Preserving Runge-Kutta method proposed in [8] is employed. The 
numerical implementation of the previously sketched approach is built in the solver 
dg-comp using the finite elements toolkit FEMilaro [7]. 

A first attempt to introduce static p-adaptivity in a DG-LES framework has been 
presented in [10]. In order to overcome the limitations of classical error estimations 
in LES, a novel indicator based on the classical structure function 


Dij — (lui +r, e) — их, 0] ших +r, t) — uj, 2 Ы 


was proposed. Large values of the structure function calculated inside the element 
denote a poorly correlated velocity field and the need of higher resolution, while 
a low structure function value denotes a highly correlated velocity field, which is 
an indication of a well resolved turbulent region or laminar conditions and of the 
possibility to employ a lower resolution. However, most of the subgrid models 
(and in particular the Smagorinsky model) perform adequately in a regime of 
homogeneous isotropic turbulence, if the filter cut-off length is inside the inertial 
range. Therefore, in such conditions excessive refinement is not necessary and one 
can let the subgrid scale model simulate the turbulent dissipation. For this reason, the 
contribution due to homogeneous isotropic turbulence is removed from the structure 
function (7). This contribution, as discussed in detail in [10], can be written as 


iso = = = rirj 
Ріо (ғ, t) = Рум (т, 06i; + (Dit, t) — Dun (т, t)) uA (8) 
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where г = |r|| and Рег, Эмм are the longitudinal and transverse structure 
functions, respectively. Once г is known, only Орг and Dyy need to be determined. 
The procedure to compute the error indicator can then be described as follows: 


1. choose a pair of points defining x and r in К 

2. compute the structure function D;; (K) based on x, r and the simulated velocity 
field 

3. compute Dyy and Dry ; by a least square fit of (8) to the structure function values 
within the element 

4. define the degree adaptation indicator as: 


Indsp(K) = VOK) = [Y [DijGO – руск). (9) 
ij 


The static adaptivity procedure presented in [10] is able to produce accurate 
results with a significant reduction in computational cost. For the simulation of 
transient phenomena, however, a dynamic adaptivity approach must be applied. The 
goal of this work, which summarizes results presented in [9], is to extend the above 
approach to dynamical adaptation, which was successfully employed in the inviscid 
case in [11, 12]. 

In those papers, in which special time discretizations approaches were employed 
that allow the use of very long time steps, the adaptation process was performed at 
each time step. In the dynamically adaptive simulations presented here, instead, 
which are carried out with a relatively small time step, the structure function 
indicator /ndsr (K) is computed every n; (К) time steps and the average of s; (К) 
subsequent values of this quantity is computed. Then, every n; (K) х s;(K) time 
steps, based on the resulting indicator value in each element, either the polynomial 
degree is left unchanged or it is updated along with the solution representation. 
Since the solution is expressed in terms of a hierarchical basis (4), when lowering 
the polynomial degree, the contribution bound to the removed modes is simply 
discarded, while when raising the polynomial degree the contribution of the newly 
added mode is left to zero, to be populated when the integrals over the element and 
faces couple the old modes with the newly introduced ones. 

Notice that, in the present implementation, no dynamic load balancing has been 
implemented for parallel runs. This means that, during the parallel execution, 
the dynamic change of number of degrees of freedom could potentially lead to 
unbalances between the load of different processors. At the moment the balancing is 
generally executed using a static polynomial distribution. While avoiding excessive 
unbalancing, this is definitely not the optimal approach and more effective load 
balancing techniques will have to be investigated in the future. 
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3 Dynamical Adaptivity Experiments 


The proposed dynamic adaptation criterion has been tested in the simulation of a 
isolated vortex superimposed on a uniform horizontal flow [5]. This simple test 
has been chosen for the preliminary study reported here, in anticipation the more 
complex tests already discussed in [9], in which the same isolated vortex impinges 
on an obstacle. The DG-LES approach described in [1] was applied, as in [10], 
with a standard Smagorinsky model for the subgrid stresses. A coarser and a finer 
mesh have been employed, both based on fully unstructured tetrahedra of constant 
characteristic length equal to /, = 1 and J, = 0.5, respectively. The indicator (9) 
is computed every n;j(K) = 2 time steps and s;(K) = 10 subsequent values 
are averaged, in order to adapt the resolution every 20 time steps. The sensitivity 
analysis of the results with respect to these parameters has not yet been carried 
out and will be the focus of future study. As in [10], two threshold values ет, €2 
are used to determine p-refinement and p-derefinement. More specifically. the cells 
with indicator values smaller than є] are assigned polynomial degree 2, those with 
indicator values larger than €2 are assigned polynomial degree 4, while the others are 
polynomial degree 3. The threshold values employed are given by e = 1 x 1074, 
єз = 1 x 107. Following [10], these values were chosen so as to achieve on 
average a total number of degrees of freedom slightly smaller than that required 
by a uniform degree simulation with p — 3. The dynamic adaptation procedure 
is able to effectively increase the polynomial degree around the vortex and follow 
it as it is advected downstream, leaving all the elements with no vortex activity at 
the lowest resolution. A map of the polynomial degrees in the domain during the 
advection of the vortex is shown in Fig. 1. 

The profiles of velocity magnitude recorded during time, along the path of 
the vortex, at different distances from the vortex starting point, employing the 
coarsest mesh, are presented in Fig. 2. The simulations obtained at different uniform 
polynomial orders are compared with the adaptive results. It can be observed that, 
even at the highest uniform resolution of degree 4 the velocity profile is distorted 
during the advection, due to the very limited grid resolution. However, the vortex 
does not diffuse and dissipate excessively, as opposed to the low resolution uniform 


p. degree р. degree 
3 E 2 3 


Fig. 1 Polynomial degree values following the advected vortices on the (a) coarse and (b) fine 
mesh; green color corresponds to polynomial degree 3, red color corresponds to polynomial 
degree 4 
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х-хо=2.5 


t 


Fig. 2 Profiles of velocity magnitude recorded during time in the vortex path centreline at different 
distances from vortex starting point, comparison of uniform degree simulations and dynamic 
adaptive one on the coarse mesh 


degree 2 simulation in which the vortex is quickly dissipated. The behaviour of the 
adaptive simulation is generally mid way between the uniform degree 4 and the 
uniform degree 3 results. 

The comparison with the uniform high degree simulations can be more easily 
observed in Fig. 3, which show the difference of the velocity magnitude profiles with 
respect to the uniform degree 4 results, still for the coarse mesh case. In the locations 
nearer to the starting position of the vortex the adaptive simulation appears close to 
the degree 4 solution when the first part of the vortex is passing, while a slight 
difference appears in the second part of the vortex, which is however always within 
the error of the uniform degree 3 simulation. In the locations farther from the initial 
starting point of the vortex, which sense the vortex passage after a longer advection 
time, the adapted simulation is always very close to the uniform degree 4 solution. 
It has to be noted that the average number of degrees of freedom of the adaptive 
simulation is 41,488, which remain almost constant throughout the simulation. This 
is 10.896 more than the 37,430 degrees of freedom needed for the uniform degree 
2 solution, 44.6% less than the 74,860 degrees of freedom of the uniform degree 3 
resolution, which is always outperformed by the adaptive one, and 68.3% less than 
the uniform degree 4 simulation. 

To correctly assess the effects of adaptivity in the case of the refined mesh, 
we study the difference of the various results with respect to the uniform degree 
4 one, presented in Fig.4. The differences are generally very small, even for the 
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Fig. 3 Difference of velocity magnitude with respect to the most refined simulation at uniform 


degree 4, recorded during time in the vortex path centreline at different distances from vortex 
starting point, on coarse mesh 


Fig.4 Difference of velocity magnitude with respect to the most refined simulation at uniform 


degree 4, recorded during time in the vortex path centreline at different distances from vortex 
starting point, on fine mesh 
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lowest resolution, however it is possible to note how the adaptive results are always 
comparable to the uniform degree 3 results, and in many points better. Nonetheless, 
the improvement created by the adaptivity is more limited than in the coarse case, 
mainly due to the fact that the mesh by itself sufficient to resolve the vortex. In 
this case the average number of degrees of freedom of the adaptive case is 170,470, 
which is 5.7% more than the 161,320 degrees of freedom of the uniform degree 2 
case, 47.296 less than the uniform degree 3 case and 70.0% less than the uniform 
degree 4 case. Also the difference in vorticity profiles between the simulation at 
uniform degree 4 and the lower resolution simulations are presented in Fig.5 for 
the coarse resolution and in Fig. 6 for the finer resolution. By comparing the results 
at the two different resolution is possible to note also for the vorticity that, at the 
finer resolution, the large scale phenomenon is correctly represented by almost 
all polynomial degrees, with a minimal vorticity dissipation, while at the coarser 
resolution only the higher polynomial degree, as well as the adaptive simulation, 
avoid an excessive dissipation of vorticity. 

At the coarser resolution, the difference of the adaptive simulations with respect 
to the uniform degree 4 ones is smaller than the differences between the other 
uniform degree simulations (Fig.5), showing that with the adaptation is also 
possible to obtain a better resolution of the vorticity profiles. The same is true also 
at the finer resolution (Fig. 5). In the dynamically adaptive simulations spurious 
acoustic waves seem to be produced by the dynamical adaptation process, see 


IV x Uj-|V x U|(deg.4) 


Fig. 5 Difference of vorticity magnitude with respect to the most refined simulation at uniform 
degree 4, recorded during time in the vortex path centreline at different distances from vortex 
starting point, on coarse mesh 
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Fig. 6 Difference of vorticity with respect to the most refined simulation at uniform degree 4, 
recorded during time in the vortex path centreline at different distances from vortex starting point, 
on fine mesh 


a) b) 


Fig. 7 Pressure time derivative in the adaptive simulation of vortex advection on (a) coarse mesh, 
(b) finer mesh, at time Т = 4; in both plots, the represented quantity takes values in the interval 
[-0.1, 0.1] 


Fig. 7. These spurious disturbances were not observed in the dynamically adaptive 
tests presented in [11, 12], which employed an implicit time discretization, thus 
strongly damping these high frequency solution components. However, as it can 
be seen inspecting the time series of the pressure values (not reported here due 
to the limited space available), these disturbances decrease rapidly in amplitude 
on the finer mesh and do not seem to propagate through the domain but rather 
follow the advected vortex. This spurious feature warrants further investigation of 
the dynamical adaptation approach if a correct approximation of acoustic waves is 
desired. 
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4 Conclusions 


The novel degree adaptation criterion for LES simulations in adaptive DG frame- 
works proposed in [10] and tested so far only in statically adaptive simulations has 
been also employed in dynamically adaptive simulations. Numerical results in the 
benchmark case of the advection of an isolated vortex have been presented. These 
results are meant to be a preliminary for the study of more complex configurations in 
which the same isolated vortex impinges on an obstacle. The presented results show 
that the proposed criterion is also effective in the dynamical case. With a coarse 
basic mesh resolution the effects of p-adaptivity are significant, leading to results 
close to the ones obtained with the maximum resolution allowed to the polynomial 
base, while when the mesh resolution is already suitable to represent the vortex even 
with the lowest polynomial degrees the adaptivity leads anyway to accurate results, 
but with an even higher reduction of the number of degrees of freedom with respect 
to the non-adaptive solutions. In a subsequent work, the results obtained in [9] for 
the case of the isolated vortex impinging on an obstacle will be presented, along 
with other application to fully three-dimensional turbulent flows. 
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A Novel Eighth-Order Diffusive Scheme A 
for Unstructured Polyhedral Grids Using as 
the Weighted Least-Squares Method 


Duarte M. S. Albuquerque, Artur G. R. Vasconcelos, and Jose C. F. Pereira 


1 Introduction 


The numerical solution of transport phenomena in complex geometrical domains 
is a subject of continuous development regarding three characteristics: accuracy, 
robustness and efficiency. The geometrical complexity can be handled with different 
grid topologies and the understanding of their issues is relevant for industrial 
applications. High-order computation is a demanding issue, motivated by a potential 
reduction of computational cost for complex computational fluid dynamics (CFD) 
problems. 

High-order accurate methods for unstructured grids have historically been 
focused on hyperbolic equations, see e.g. Lê et al. [1]. Barth and Frederickson [2] 
developed a high-order Finite Volume Methods (FVM) for the resolution of the 
Euler equations, using a quadratic polynomial. The coupling of Euler system with 
viscous terms, which requires diffusive schemes was achieved by Ollivier-Gooch et 
al. [3]. 

In the last years, the development of high-order methods was applied for the 
resolution of parabolic and elliptic problems in unstructured grids, see e.g. Boularas 
et al. [4]. The range of possible applications varies from Poisson problems, see 
Batty [5], heat transfer problems, see e.g. Chantasiriwan [6], diffusion equations 
with variable coefficients, see Zhai [7], or discontinuous coefficients, see e.g. Clain 
et al. [8]. 

Several polynomial reconstruction techniques applied to FVM can be high- 
lighted: the fourth-order methods of Ollivier-Gooch et al. [9], Cueto-Felgueroso 
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et al. [10], and Nogueira et al. [11], also sixth-order results have been reported by 
Clain et al. [12]. The objective of this work is to extend the weighted least-squares 
(WLS) method to very high-order schemes and polyhedral unstructured grids. 

In terms of other applications with the weighted least-squares technique. Magal- 
haes et al. [13] and Albuquerque et al. [14] have developed, respectively, relative and 
absolute error estimators for second-order finite volume schemes with unstructured 
grids. Martins et al. [15, 16] has created a third-order interpolation method with 
divergence free constraint for immersed boundary applications, respectively, for 
Cartesian and unstructured polyhedral grids. 

The following manuscript is divided in four sections: in Sect. 2 the implemented 
method for two dimensions is briefly described, in Sect.3 the verification of 
the implemented schemes, with Cartesian and perturbed grids, is carried out. 
Section 4 shows the results for a case with irregular polyhedral and triangular grids 
and proposes a novel method to treat the Neumann boundary conditions, Sect. 5 
concludes the manuscript with a summary of the principal achievements of this 
work. 


2 Elliptical Operator for Unstructured Grids with the Least 
Squares Technique 


In this work, the Poisson equation will be solved, which is defined by: 
У .Уф = 9e, (1) 


where ф is the transported variable and фо is the source term that is required when 
using manufactured analytical solutions and it is equal to its own Laplacian. After 
applying the classic Finite Volume method in a Poisson equation the following 
equation is obtained: 


У^ У Voue, Sr [ „9947, 02) 


feF(P) geG(f) 


where F (Р) is the set of faces of cell P, G (f) is the set of Gauss points of the face 
J. Se 15 the face normal vector and wg is the weight of Gauss-Legendre Quadrature. 
The important part of this method is how the calculation of the face gradient Уф, is 
carried out at each Gauss point. This will be explained in the next subsection. 


2.1 Polynomial Reconstructions 


To obtain the gradients values at the integration points, a reconstruction of the 
unknown primitive variable is performed at the face centroid, using a polynomial 
expansion. 
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Table 1 Number of terms of the Taylor expansion required for a р!” 


dimensional (2D) cases 


order polynomial at two 


р!" Order polynomial 1 3 IE Tz 
Number of terms 3 10 21 36 


The number of terms of the polynomial has to take into account the required 
order of the scheme and it has the following form: 


p? (x, у) = Ci + Co (x 2 xf) + C3 (y — уу) + 


(3) 
2 2 
+С (x — ху) + Сз (x= ху) (у — уу) + 66 (у- ул) +. 
Expression (3) can be written in a more compact form, a vectorial one, as: 
p? (x) = dy (х) су. (4) 


where the subscript f refers that the reconstruction is made at the face f and 


2 2 
arc = |1, 6-729. (у— уу), 6-39". 6-060 0. (>), 
+1], ху = (ху, yf) is the face centroid coordinates vector, x = (x, y) 
is the coordinates vector of a point used for the reconstruction and су = 


[Ct Сә, Сз, Сд, Cs, Co, -- J are the reconstruction constants. 

Table 1 lists the number of terms of the expansion for each polynomial used in 
this work. 

The order of accuracy of the numerical scheme is p + 1, consequently the linear 
reconstruction will be second order accurate, the cubic reconstruction will be fourth 
order accurate, the fifth polynomial will have sixth order accurate and finally the 
seventh polynomial will be eighth order accurate. The numerical schemes will be 
called of FLS ( p+ 1) according to the global order of the implemented method. 
For each order a minimum number of Gauss points are required to maintain the 
respective Quadrature order. 


2.2 General Approach 


The Weighted Least Squares (WLS) method is a technique used to solve overdeter- 
mined problems, where there are more independent equations than unknowns. 
Equation (4) results in a system of linear equations, which the form as: 


русу =ф,, (5) 
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Fig. 1 Examples of different vertex neighbours order from the red face 


where D у is a combination of у (x) for every point of the reconstruction resulting 
in a matrix with ns X Neoefs entries. The cy is a column vector with ncoefs entries, 
ф; is a column vector with п; entries, ncoefs is the number of constants of the 
р!" polynomials and л, the size of the computational stencil, which is the set of 
the computational values and points used in the reconstruction and is made of cell 
neighbours of the face. Since ns > ncoefs, the problem is overdetermined and so the 
WLS technique is used in order to minimize the weighted residual of the problem. 

To solve this problem, specific stencils must be used for each scheme order. This 
is done by using vertex neighbours according to the experience of the Authors in 
a previous work [14]. Each successive order scheme requires an higher stencil to 
respect the п; > Neoefs condition. Figure 1 shows examples of these stencils for a 
regular polyhedral and triangular grid. Basically each successive vertex neighbours 
(from 1 to 4) is used for a scheme with an even order accuracy. For example the 
second order scheme only needs a first order of vertex neighbours from the face 
marked in red. 

Other details used in the global matrix construction A;; are described in the 
work of Vasconcelos et al. [17]. Each line of the global matrix A;; corresponds 
to the diffusive discretization of the cell i and has to consider the diffusive flux 
integral for each face of the cell. This flux integral is computed from the polynomial 
reconstruction centered in the respective face and which was described previously. 
Finally the high-order diffusive fluxes can be written in the following matrix form: 


Aig) = У` У` шегі (Xe) 9j | S7. (6) 
ТЕЛО) \ geG(f) 


where F(i) is the set of faces from cell i, G(f) is the set of Gauss-Legendre points 
of face f, шс, is the weight, x, are the coordinates of each Gauss-Legendre point 
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g and tf; is the contribution from cell j to the face f diffusive flux from the 
reconstructed polynomial. The set of cells j is defined by the used stencil in the 
polynomial reconstructed at each face f, globally each line i will have contributions 
from all cells that result from the junction of sets from all faces of cell 7. 


3 Order Convergence Verification with Cartesian 
and Perturbed Non-uniform Grids 


A numerical test is performed in non-uniform grids with an certain imposed 
displacement. This perturbation is done by moving randomly the grid lines in a 
range between zero and a % (у) of the grid size from the Cartesian grid counterpart. 
This perturbation can be done in either a positive or negative direction. The cells of 
the grids are always squares and a reference grid without any perturbation is used, 
i.e. a Cartesian one. For this case, the following analytical solution was used and 
solved in a 1 x 1 square domain: 


(x — 0.5? + (y — 0.5)? 


А 7 
0.0175 0) 


(x.y) =exp | - 


Table 2 lists the error ratios, r, between a grid with an imposed perturbation and 
a regular one. Showing the ratio for both the mean and maximum error of the finest 
grid at study. Particularly for the FLS6 and FLS8 schemes, the error could be one 
order of magnitude greater than the obtained with the Cartesian grid. It is also shown 
that an imposed perturbation up to 20% has a low numerical error penalization. 

Figure 2 shows the convergence curves obtained but only with the FLS4 and 
FLS8 schemes. It is possible to observe that the theoretical convergence orders is 
achieved for every perturbed grid. 

Figure 3 shows the error distribution for the FLS8 scheme with a Cartesian and 
perturbed grid with у = 30%. It is shown that the error distribution is severally 
changed by the imposed perturbation at the grid. 


Table 2 Ratio of mean and maximum error norms for all schemes between grids with an imposed 
perturbation and a Cartesian one with 25,600 cells 


FLS2 FLS4 FLS6 FLS8 
у% ri Foo ri Too ГІ Гоо ГІ Гоо 
10 1.01 1.54 1.52 4.44 1.86 10.35 2.20 8.19 
20 1.09 1.55 2.33 4.72 2.81 15.15 3.62 11.05 


30 1.22 1.78 3.24 7.53 3.89 22.30 17.28 30.61 
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Fig. 2 Convergence curves for the imposed perturbed grids with FLS4 and FLS8 
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Fig. 3 Error distribution for FLS8 scheme and two grid with 25,600 cells: one without any 
perturbation (left) and one with an imposed perturbation (right) 


4 Results for Several Grid Types and with Neumann 
Boundary Conditions 


To verify the applicability of the proposed schemes to other grid types and Neumann 
boundary conditions. The numerical verification was performed with an analytical 
solution in a square domain, [0, 1]. A Neumann boundary condition were imposed 
at the vertical faces and a Dirichlet boundary condition at the remaining ones. 

Two different approaches were used when considering Neumann boundary 
conditions. The first approach is the classic one, which consists in simply derivation 
of the respective line from the least-squares matrix that represent the boundary face. 
It will be defined as the general case (GC) approach. 

The second approach which is new to the Author's knowledge and it consists 
on the multiplication of each line of the D ; matrix referent to Neumann boundary 
face, b, by the respective face area, Sp. That line will be written by Vd y (хь) Sp, 
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instead of Vd y (хь) n; and the entry for the vector ф; is given by Уф,9, instead 
of Уф,пь. Consequently, the problem will have the following aspect: 


1 (x1 — xy) (vi — yf) (а = ху) буз | $1 

1(2—ху) (2-уг) (Qm-x) с 

0 5 Sp, 2 (хь = xy) Sp, Уфьзь 
M А А Creoefs . 


where the line in the matrix separates the contribution of the stencil cells and 
Dirichlet faces from the contribution of the Neumann faces of the current considered 
stencil. 

The goal of this operation is to ensure that the vector ф; and each line of the 
least-squares matrix Dy have the same unit dimensions, something that does not 
happen with the classic approach. This approach will be designed as dimensional 
correction (DCN) for Neumann boundary condition. 

Numerical tests were performed for two grid types: irregular polyhedral and 
triangular grids. The analytical solution is given by: 


ф (х, y) = sin (Злх) sin (Злу) A (9) 


where in the Neumann boundaries the face flux will be Уфь · Sp = 0. 

Figure 4 shows the convergence curves to both approaches applied for all 
schemes with the irregular polyhedral (left) and triangular grids (right). The solid 
line represents the DCy approach and the dotted one represents the classic GC 
approach. The results point out that the theoretical convergence order is always 


Fig. 4 Convergence curves of the mean error for mixed boundary conditions with irregular 
polyhedral and triangular grids for all schemes. The dotted lines are the convergence curves for 
the GC approach and the solid ones represent the convergence curves with the DCN approach 


356 D. М. S. Albuquerque et al. 


Table 3 Comparison between the two approaches for a problem with an imposed Neumann BC 
for all schemes applied to the irregular polyhedral grids 


Polyhedral FLS6 
us Lii | 0% 
545 126 
2113 231 
s321 [1126-02 |099 |104 |108 [116 |104 |108 |2157 | 74.12 
33,024 361.82 


Table 4 Comparison between the two approaches for a problem with an imposed not-null 
Neumann BC for all schemes applied to the triangular grids 


Triangular 
ее И m ppm gn [re СИ го 

m ass | osi 
399 053 
3638 1.98 
14,632 190.13 | 297.64 
58,608 790.73 |915.16 


achieved for both grids and indicates that the ОС approach improves the schemes 
performance, being more evident for the FLS2 and for FLS8 schemes, specially to 
the last one. The behaviour of the finest grids are more stable with the DCy. 

Table 3 lists the comparison of the two approaches used for the Neumann BC 
for the irregular polyhedral grid, the comparison is made through the ratio between 
both approaches and using the mean and maximum error norm, r is computed by: 


1е16С d 
ЕШ 


where i is the error norm used for the calculation. 

The results show that the biggest decrease of the error occurs for the maximum 
error. For the FLS8 scheme the error can be reduced up to 21 times, since the new 
method avoids the truncation error issue presented in the GC and showed in Fig. 4. 

Table 4 lists the comparison between the two approaches used for the Neumann 
BC with the triangular grids. The results obtained allow to conclude that the major 
decrease of the numerical error occurs for the maximum error, which can be reduced 
almost one order of magnitude for the second-order scheme and to half with the 
fourth-order scheme. For the sixth-order scheme the maximum error with this new 
approach is slightly worse, almost 10%, than the general approach, however in terms 
of mean error the gain is evident since the mean error is reduced to half with the 
DCN approach. For the eighth-order scheme, it is possible to reduce the error in 
about three orders of magnitude since it avoids the truncation error issue. 
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5 Conclusions 


Verifications tests have been performed for a new high-order scheme based on the 
weighted least-squares technique and the Finite Volume method. The convergence 
curves have showed an excellent behaviour indicating that the theoretical order is 
achieved for all cases at study. Also the new reconstruction method is not very 
sensitive to the imposed perturbations in the grid or either the topology of the cells. 

Additionally, the results allowed the novel proposed approach to treat the 
Neumann boundary conditions, improving the quality of the solution. These results 
are the expected ones, since in the WLS problem the dimensions of the matrices are 
identical to each other, when using this proposed approach. 
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An Explicit Mapped Tent Pitching A 
Scheme for Maxwell Equations gaii: 


Jay Gopalakrishnan, Matthias Hochsteger, Joachim Schöberl, 
and Christoph Wintersteiger 


1 Introduction 


Electromagnetic waves propagate at the speed of light. Thus, the field at a certain 
point in space and time depends only on field values within a dependency cone. A 
tent pitching method introduces a special “causal” spacetime mesh that respects 
this finite speed of propagation. It is not limited to Maxwell equations, but can 
be applied to general hyperbolic equations. A tent pitching method requires a 
numerical scheme to discretize the equation on that mesh. Discontinuous Galerkin 
(DG) methods are of particular interest since they offer a systematic avenue to 
build high order methods. For a given initial condition at the bottom of a tent, 
the discrete equations may be solved within each individual tent, up to the tent 
top. The computed solution at the tent top provides initial conditions for the tents 
that follow later in time. This method is highly parallel, since many tents can be 
solved independently. Methods using such tent-pitched meshes may be traced back 
to [5, 7]. More recent works [1, 6, 8] develop Spacetime DG (SDG) methods within 
tents by formulating local variational problems, for which linear systems are set up 
and solved. Although these systems are local, the matrix size can grow rapidly with 
the polynomial order, especially in four-dimensional spacetime tents. In this context 
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it is natural to ask if one can develop explicit schemes (which usually perform well 
under low memory bandwidth) that take advantage of tents. 

А key ingredient to answer this question was presented in [2], where Mapped 
Tent Pitching (MTP) schemes were introduced. The MTP discretization, which 
proceeds by mapping tents to a spacetime cylinder, allows one to evolve the solution 
either implicitly or explicitly within tents. The memory requirements of the explicit 
MTP scheme are limited to what is needed for storing the spatial mesh, the solution 
coefficients at one time step, and the topology of the tents. 

In this work, we show that notwithstanding the above-mentioned advantages 
of the explicit MTP scheme, one may lose higher order convergence if a naive 
time stepping strategy (involving a standard explicit Runge-Kutta scheme) is used. 
We then develop a new Taylor time-stepping for the local problems within tents. 
Despite its simplicity, our numerical experiments show that it delivers optimal order 
of convergence. 


2 Mesh Generation by Tent Pitching 


We start with a conforming spatial mesh consisting of elements .7 = {Т} and 
vertices 7 = {У}. We progress in time by defining a sequence of advancing fronts 
tj. А front т; is given as a standard nodal finite element function on this mesh. It is 
defined by storing the current time for every vertex of the mesh. We move from т; 
to the next front т;--1 by moving one vertex forward in time, while keeping all other 
vertices fixed. The spacetime domain between т; and t;+1 we call a tent. In Fig. 1, 
the red domain is the tent between т; and т;+1. 

Its projection to the spatial domain is exactly the vertex patch wy around V of 
the original mesh. The data to be stored for one tent are the bottom and top-times of 
the central vertex, plus the times for all neighboring vertices. 

Note that although the algorithm is described sequentially, it is highly parallel. 
Vertices with graph-distance of at least two can be moved forward independently. 
For example, in Fig. 1, all blue tents can be built and processed in parallel. 

The distance for advancing a vertex is limited by the speed of light, a constraint 
often referred to in the literature as the causality condition. Under this condition, the 
Maxwell problem inside the tent is solvable using the initial conditions at the tent 
bottom. Thus, the top boundary is an outgoing boundary and no boundary conditions 
are needed there. 


Fig. 1 Tent pitched spacetime mesh for a one-dimensional spatial mesh 
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Note that the spatial mesh in Fig. 1 is refined towards the right boundary, which 
leads to smaller tent heights at the right boundary. Hence, smaller time steps in 
locally refined regions is a very natural feature of tent pitching methods. 


3 The MTP Discretization 


Now, we consider the discretization method for опе tent domain К = {(x,t): x € 
Oy, p(x) < t € ф,(х))}, where wy is the union of elements containing the vertex 
У, and фь and g; are the bottom and top fronts, respectively, restricted to wy. Our 
aim is to numerically solve the Maxwell system on К, namely 


QE = VxH, иН =-УхЕ, (1) 


where boundary values for both fields are given at the tent bottom and У = У, 
denotes the spatial gradient. 

The approach of MTP schemes is to map the tent domain to a spacetime cylinder 
оу X (0, 1) and solve the transformed equation there. The transformation from the 
cylinder to the tent is denoted by Ф : wy x (0, 1) — К and is defined by (x, В = 
(x, ф(х, Ò) where 


ф(х, й) = (1— фь(х) + fp (x) . 


It is similar to the Duffy transformation mapping a square to a triangle (see Fig. 2). 
With the notation 


0 E; —E, 
skew E = | —E; 0 Ex | 
E, —Ey, 0 


we can rephrase the curl operator as V x E — divskew E, where the divergence 
of the matrix function is taken row-wise. To simplify notation further, we define 
и: К — Rô by u = (E, H), and set g : К > RÓ and f : К > RÓ'? by 


€E — skew H 
g(u) = и | по | Rud Q) 


Then (1) may be rewritten as the conservation law 0;g(u) + div, f(u) = 0. 
Furthermore, we define F(u) € IRÓ*^ as 
—skewH &E 
F = = 
(и) = [/ (и) gu)] | or | | 
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-X d ; -X 


Fig. 2 Tent mapped from a tensor product domain 


which allows us to write Maxwell's system (1) as the spacetime conservation law 
divy F(u) 20. (3) 


For each row of F, the spacetime divergence div,,; sums the spatial divergence of 
the first three components with the time-derivative of the last component. 

Now, we apply the Piola transformation to pull back F from the tent K to the 
cylinder using the mapping Ф. The derivative of and its transposed inverse are 


_s-l 
Ф' = a and oj Sl M | 
Vo! $ 0 8 


The Piola transform of F is РОЙ) = Al F} = (det Ф')(Еоф)(Ф) 7 witháü-uod. 
Since the Piola transform provides an algebraic transformation of the divergence, 
Eq. (3) is simply transformed to div, ; F (й) = 0 on the spacetime cylinder. Then, 
inserting the Jacobian of leads us to the transformed equation 


д.08 00) — f (&) VQ) + div (8/(4)) = 0, (4) 


where ó(x) = p(x) — фь(х) is the local height of the tent. Note that Уф is an 
affine-linear function in quasi-time f. Equation (4) describes the evolution of i along 
quasi-time from ¢ = 0 to f = 1. Details of the calculations are given in [2]. 

The next step is the space discretization of (4) by a standard discontinuous 
Galerkin method. Let Vj, C [L2]° be the DG finite element space of degree p on 7 
On each tent we search for й : [0, 1] — Vj, such that 


f & [s (£) — Уф] и — У | 5/()Ут + M. | (+, )[] = 0 


TCay FCoy 


holds for all v, € У, and all f € [0,1]. Only the restriction of Vp on the patch 
wy is used in this equation. The numerical flux f, (û*, i^) depends on the positive 
trace lim,_,9+ u(x + sn) and negative trace lim, ,9« a(x — sn), where n is a unit 
normal vector of arbitrary orientation to the face. The jump is defined as usual by 
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[a] := й'-й and the mean value by {i} :— 101 + 07). One example is the 
upwind flux [3, p. 434] 
pog a 
—{Е} x n + [A] 


with the tangential components É, = —(Ё хп) x n and H, = —(Й хп)хп of É = 
Eoó and Н = H o ®. Note that the local tent height ó enters the boundary integrals 
as a multiplicative factor. At the outer boundary of the vertex patch we have 8 = 0, 
so the facet integrals on the outer boundary disappear. For the above semidiscrete 
system, initial values for the tent problem are given finite element functions at the 
tent bottom. The finite element solution on the tent top provides the initial conditions 
for the next level tent. Therefore, no projection of initial values is needed when 
propagating from one tent to the next. 

After the semi-discretization, as usual, we are left to solve a system of N — 
dim Vj (оу) ordinary differential equations for U : [0, 1] > RN, 


d А A n 

[МИ] 5 - AU® — 0, ГЕ (0,1), (5) 
given U (0). The non-standard feature of (5) is that M is ап affine-linear function 
of the quasi-time f (since our mapping enters the mass matrix M through Уф). The 
matrix A is independent of f. A straightforward approach is to substitute Y = MU 
and solve 


Чу AM-ly =0, 
dt 

instead of (5). Although first order convergence was observed with this strategy, 
further numerical studies showed reduced order of convergence if the stage-order 
of the Runge Kutta (RK) method is not high enough—see Fig. 3 (right). While the 
implicit MTP schemes discussed in [2] do not show this problem, the issue remains 
critical for explicit schemes. Thus, we propose to use a new type of explicit time- 
stepping for time discretization, discussed next. 


4 Structure-Aware Taylor Time-Stepping 


Returning to the ordinary differential equation (5) and continuing to make the 
substitution Y = MU, we now reconsider the previous equation as the following 
differential-algebraic system: 


d 
—Y=AU, Y=MU. (6) 
dt 
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We begin by subdividing the interval (0, 1) into m € N smaller intervals of size I, 


defined by (Å, йт) = (4, ++), fori є N and 0 < i < m — 1. Recall that A is 
independent of quasi-time f, and M is an affine function of f, i.e., 


M(f) = Mi + —5)M', Ре Ñ, f+) 


where М; = M (f;) and the derivative M' is a constant matrix. We want to design a 
time-stepping scheme that is aware of this structure. 

Consider the approximations to Y, U on (f, f;+1) in the form of Taylor polyno- 
mials Y;, U; of degree q, defined by 


d (2 7 ql. n 
A (t — ti)” А (t — ti)” д жа 
Үү =) —— őn Xe —— ee — edt 


n! n! 
n=0 n=0 
(7) 


where Y; n = Y Ae (i) and Ui = m (fi). To find these derivatives, we differentiate 
both equations of (6) n times to get 


YDA = АО, n 


IV 


0, 
YOA = M(U (f) - nM'u-V() , n2l. 


For the second equation we used Leibnitz’ formula (fg) = Y (") /®&@—®, 
and the fact that M is affine-linear. Evaluating these equations for Фе Taylor 
polynomials Y;, U; at f = fj, we obtain a recursive formula for Yin and Uin in 
terms of Ui „1, namely 


Ven = АШ n-1 , l<n<q, 
(8) 
MiUi,, = Yi —nM'Uin-1 , lang q-— l, 


for allO < i < m — 1. Given Yoo = Y (fo). MoUo,o = Yo,o. applying (8) with 
i = 0 gives the approximate functions Yo(t), Uo(£) in the first subinterval (fo, й). 
The recursive formulas are initiated for later subintervals at n — 0 by 


Yio = Yi- È), М; 0:0 = Yi о, I<i<m-l. (9) 


After the final subinterval, we get Y,,-1(tm), our approximation to Y(1). We 
shall refer to the new time-stepping scheme generated by (8) as the q-stage SAT 
(structure-aware Taylor) time-stepping. 

Note that Y,, 4 (fm) is our approximation to Y = MU at the top of the tent. This 
value is then passed to the next tent in time. The time dependence of M arises from 
the time dependence of Уф. This gradient is continuous along spacetime lines of 
constant spatial coordinates. Therefore, when passing from one element of a tent to 
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the same element within the next tent in time, Y is continuous (since the solution U 
is continuous). Of course, оп flat fronts Уф = Ут = 0, so there M is just a diagonal 
matrix containing the material parameters. 

To briefly remark on the expected convergence rate of a q-stage SAT time- 
stepping, recall that due to the mapping of the MTP method we solve for й = uo, 
which satisfies дей = д" (ди) о Ф. The causality condition implies that  — 0 if 
the mesh size h — 0. Thus we may expect the nth temporal derivative of й, and 
correspondingly U, to go to zero at the rate O(n"). By using a q-stage SAT time- 
stepping, we approximate the first q — 1 terms of the exact Taylor expansion of U. 
Thus we expect the convergence rate to be O (h1), the size of the remainder term 
involving U (4), The next section provides numerical evidence for this. 

Before concluding this section, we should note that in (8) and (9), we tacitly 
assumed that М; is invertible. Let us show that this is indeed the case whenever the 
causality condition (see Sect. 2) |Vo| < ./ey is fulfilled. At any quasi-time f, given 
аф = (Фе, фн) € Vn whose coefficient vector in the basis expansion is W є RY, 
consider the equation M (ÀU = W for the coefficient vector U of û € Vj. This 
equation, in variational form, is 


| [в(й)— f@)Vg]-d= f (ФЕ, Ùp) 0, for all 0 € Ур. (10) 


Let a(u, 0) denote the left hand side of (10). To prove solvability of (10), it suffices 
to prove that a(-, 2 is a а bilinear form on [22] for any f. By inserting 
g(u) = [eE, и НГ and f (à) = [— skew Н, skew E]? into а(й, it), 


a(ù, й) = (e£ — Н x Vg)- Ê + (uH + Ê x Vg)- Â 
wy 
-f eÊ. Вий. Ê +UÊ x Vo) Й 
оу 


eÊ. £c ul Ê- 219 ea yaa, 
25 E 


where we used the Cauchy-Schwarz inequality and inserted „/є and \/м to achieve 
the desired scaling. By applying Young's inequality and |Vo| < yeu, 


x ^ p а pg. М 
а(и,и) > Е. Е-ИН-Н- 
oy v 


(eÊ. £ - uÉ - Й) 
gn 


- f. (1-54) (eÊ. Е+ ый. Й) > C min(e, 2112113, , 


form some constant С > 0. Thus М; is invertible and the SAT time-stepping is well 
defined on all tents respecting the causality condition. 
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One may exploit the specific details of the Maxwell problem to avoid the 
assembly and the inversion of matrices M; (as we have done in our implementation). 
In fact, instead of (10), we can explicitly solve the corresponding exact undiscretized 
equation obtained by replacing Vp by [L2]® in (10). The solution й = (Е , H ) in 
closed form reads 


1 

eu —|Vol? 
1 

eu — Уф? 


2 1 д » 
É (1 - z;vever) (ube + Dg x V), 


H 


1 ^ 
( — a veve!) (eg — Фе x Уф). 
m 


We then perform a projection of these into V, to obtain the coefficients U (Ё). 
For uncurved elements, this just involves ће inversion of a diagonal mass matrix. 
For the small number of curved elements, we use a highly optimized algorithm 
which uses an approximation instead of the exact inverse mass matrix. 


5 Numerical Results 


The MTP discretization in combination with the SAT time-stepping on tents is 
implemented within the Netgen/NGSolve finite element library. In this section 
numerical results concerning accuracy as well as performance are reported. 


5.1 Convergence Studies in Two Space Dimensions 


We consider the model problem in two space dimensions 
дє E. = 9, Hy = dy Ay 2 du Hy = —dyE, s du Hy = Ox Ez , 
on the spacetime cube [0, л]? x [0, A2 ]. Parameters are set = = и = 1 such 


that speed of light is c — 1. Initial and boundary values are set such that the exact 
solution is given by 


E. = sin(x) sin(y) cos (21) , 
Ну = n sin(x) cos(y) sin (21) , 
H; = 5 соѕ(х) sin(y) sin (М2) $ 


Based on a spatial mesh with mesh size h, we generate a tent pitched mesh 
such that the maximal slope |Уф| is bounded by (2c)~! and apply a discontinuous 
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(p + 1)-stage SAT time-stepping classical Runge-Kutta method 


10? 105 10’ dof 10? 10° 107 dof 
ә p=1 —— p —2 = p —3 —— p=4 
6(h--- eq)---6(P)- - Oh) --- Olh’) 


Fig. 3 Spatial L» error of all field components over degrees of freedom (dof) for the ( p + 1)-stage 
SAT time-stepping (left) and the classical Runge-Kutta (right) 


Galerkin method in space using polynomials of order p, with 1 < p < 4. On each 
cylinder we perform a (р + 1)-stage SAT time-stepping with т = 2p intervals. The 
spatial L» error of all field components at the final time is reported in the left plot 
of Fig. 3. We observe that the error goes to zero at the optimal rate of (л?!) until 
we are close to machine precision. 

In contrast, the right plot in Fig.3 illustrates the previously mentioned loss of 
convergence rates when the classical Runge-Kutta method is used. The convergence 
rates stagnate at first order no matter what p is used. A similar behavior was also 
observed for other explicit Runge-Kutta methods. 


5.2 Large Scale Problem in Three Space Dimensions 


As a second example we present a simulation on a domain similar to the resonator 
shown in [4]. The geometry is given as body of revolution of smooth B-spline 
curves. The mesh consisting of 489,593 curved tetrahedral elements is shown in 
Fig. 4. Due to higher curvature the mesh is refined along the inner roundings, where 
the ratio of the largest to the smallest element is approximately 5:1. We used a 
Gaussian peak (located at the axis of revolution and the position of the fifth inner 
rounding) for the electric field as initial data. The explicit MTP scheme with SAT 
time-stepping then computed the solution at £ = 260 using time slabs of height 
1, with each slab composed of Niens = 149,072 tents. On each tent we used а 
(p+ 1)-stage SAT time-stepping with m = 2p intervals, where p denotes the spatial 
polynomial order. With the spatial degrees of freedom Naof,i of the ith tent and the 
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@ queues 


Fig. 4 Tetrahedral mesh with 489 К curved elements, ratio of the largest to the smallest element of 
approximately 5:1 and the Н, component of solution at t = 260 calculated with spatial polynomial 
order p = 3 


Table 1 Number of degrees of freedom and simulation times for spatial polynomial orders 


р=2,3 
Number of spatial dof 2.938 x 107 
Number of spacetime dof per slab 1.908 x 10? 


Simulation time per slab 4.6s 
Total simulation time 20 min 


This data was generated using a shared memory server with 4 E7-8867 CPUs with 16 cores each 


number of stages g = p + 1, we obtain the total spacetime degrees of freedom per 
time slab 


Мем Мем 
У Norma = | Y, Nori |2р(р+1). 
1=1 i-l 


The corresponding numbers of degrees of freedom and the simulation times are 
shown in Table 1. In [4] a similar problem is solved using a discontinuous Galerkin 
method with quadratic elements, combined with a polynomial Krylov subspace 
method in time. Using 96 cores it took them 7:10h to reach the final time. Our 
simulation with polynomial order р = 3, which has a comparable number of 
unknowns, took 3:33 h on 64 cores. This significant speed up is an illustration of 
the capability of the new method. The Н, component of the obtained solution at 
t — 260, using third order polynomials in space, is shown in Fig. 4. 
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Viscous Diffusion Effects A 
in the Eigenanalysis of gaii: 
(Hybridisable) DG Methods 


Rodrigo C. Moura, Pablo Fernandez, Gianmarco Mengaldo, 
and Spencer J. Sherwin 


1 Introduction 


When numerically solving partial differential equations, numerical errors are likely 
to impact not only solution accuracy, but also the stability/robustness of the 
computation. This is particularly the case in eddy-resolving approaches to turbulent 
flows, such as large-eddy simulation (LES) and direct numerical simulation (DNS). 
Also, in the so-called implicit LES / under-resolved DNS strategies [1], where 
numerical error (specifically dissipation) provides small-scale regularisation in lieu 
of a turbulence model, understanding the nature of numerical errors is crucial. 
These typically appear in the form of dispersion and diffusion errors, where the 
former distorts the solution, while the latter is responsible for its damping. A useful 
framework for the assessment of such numerical errors is the eigensolution analysis 
technique [2, 3]. 
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We present the first eigenanalysis of hybridisable discontinuous Galerkin (HDG) 
methods. This is also one of the first studies to consider viscous diffusion effects in 
the eigenanalysis of discontinuous SEM (spectral element methods), as it addresses 
the advection-diffusion equation in one dimension. Focus is given to the temporal 
analysis approach [2, 5], which is suited for problems with periodic boundary 
conditions. The spatial analysis [3, 4], suited for inflow-outflow problems, will 
be considered in subsequent studies. Here, we offer preliminary results on (i) 
the effects of the Peclét number (a cell-based Reynolds number), and (ii) the 
interplay between upwind (numerical) dissipation and viscous (physical) diffusion. 
We highlight how these results improve upon our understanding and practice of 
implicit LES / under-resolved DNS approaches. 

We note that, although a non-modal eigenanalysis strategy better suited for turbu- 
lence computations has been recently proposed [6], the present work will focus on 
more fundamental aspects and follow therefore the classical eigenanalysis. Finally, 
the results presented here are representative of a broader class of discontinuous 
SEM, given the well established connections within this class—see e.g. [7]. 

This paper is organized as follows. Section 2 introduces the HDG discretisation 
as applied to the linear advection-diffusion equation in one dimension. Section 3 
details the temporal eigenanalysis framework and presents our preliminary results. 
Finally, in Sect. 4, our conclusions are summarised and future research topics are 
outlined. 


2 HDG Discretisation 


In one dimension, the linear advection-diffusion equation is given by 


ди Г. ди 92и (1) 
== а— = —, 

дї дх " ax 

where the advection velocity a and the viscosity u are positive constants. This 
equation can be written in conservation form through the flux function f(u, g) = 
au — ug, as the system 


ди | Of 
ELEME 2 
Ot ax 03 (2) 
ди 
– — = 0, 3 
8 — 5х (3) 


where g is the auxiliary gradient variable. The discretisation procedure is similar to 
that of traditional DG methods. 
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After the (1D) physical domain is partitioned into non-overlapping elemental 
regions €2 of size h, the numerical solution and its gradient are locally approximated 
by polynomial expansions in the form 


P 


" 
ие = J aj( Oj), gla = D> E&OE), (4) 


j-0 j=0 


where ф; are polynomial basis functions of degree up to P, defined in the standard 
domain 9,;; = [—1, 1]. A linear mapping relation is assumed between the physical 
coordinate x of element Q and the coordinate & € Qs;. 

Multiplying Eqs. (2)-(3) by ¢;, integrating over element © and applying integra- 
tion by parts leads respectively to 


h ди | A Ф _ дф: 
5, аав + (Fo). = А fea, (5) 
h дф сы, УФ 
| edi dé + «а = (ue)? , (6) 


where © and © denote the left and right boundaries of element ©, in that order. As 
typical, expansions in (4) are to be inserted into (5)-(6), which are then required to 
hold for i = 0,..., P. Note that the integrals above have been moved to ©,; and 
interface quantities и and f have been introduced. The state average й is peculiar to 
НОС in that it represents a uniquely defined interface variable whose value stems 
indirectly from the enforced continuity of the numerical flux F. This continuity 
ensures local conservation for HDG methods, regardless of the chosen flux formula. 

For the advection-diffusion problem at hand, the interface fluxes on either side 
of a given element (cf. Fig. 1, left diagram) can be taken in the form 


fe = fle, вө) — Tue — ие), (7) 

fo = fle, go) — т(ие — йе), (8) 
Fig. 1 Notation adopted for _ Q _ m & == ak 
the element viewpoint (left) йе üg 
and the interface viewpoint L 
(right) uo ® 

ug 
Io _ — 
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in which 
P P 
ug = 3 üjójC-D. ge = УФ). (9) 
j=0 j=0 
Р Р 
ue = M ájój(-D. вө = Y Bj ФП. (10) 
j=0 j=0 


Also, т = В|а| + о is a stabilisation constant combining an upwinding parameter В 
and a penalty term c that accounts for the partially diffusive character of the model 
equation considered. This work however assumes с = 0 as it focuses on advection- 
dominated cases, which are typically stable without the penalty term о’, even within 
the context of turbulence simulations [8]. 

Flux formulas (7)-(8) are inspired in Ref. [9]. In the case of pure advection (with 
о = 0), the interface solution variable becomes the simple average й = uk + uk of 
the adjacent states from the left (L) and right (R) elements sharing the considered 
interface. Under this case, it is also easy to show that the fluxes in (7)-(8) recover 
those used in traditional DG methods, whereby HDG exactly reproduces DG. This 
does not hold, however, when diffusion is taken into account, in which case й is 
only implicitly defined from the flux continuity condition enforced at interfaces, 


fe = fÈ, namely 


ай — ugk с (a ul) = ай — ugs r (ug it) А (11) 


where 85 апа gë depend on values of п at two other interfaces via (6). The diagram 
on the right-hand-side of Fig. 1 should help clarify the notation adopted. 

Using vectors й = {й0,....йр} and $ = (80,..., gp}", the flux continuity 
condition (11) becomes 


~ l (arnt, 2T5R И (^T^R тА, 
it = (8505 + 65a") + = (656 - 058"). (12) 
where фо = {ĝo(+1), ..., óp -D)* and фо = {ĝo(—1), .... óp (7 D]T. Like- 
wise, (6) can be written as 


h ^ A cu ^ 
5;M8 + Du = dele — fote, (13) 
in which matrices M and D have been introduced, namely 


дф 
Mi; =| gibi 48, Dij =f ds dé. (14) 
om Qst & 
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Finally, (5) becomes 


h dû syo ay | | 
pM + Фе №» — ofo = ари — “Ds, (15) 
with 
NP oT A ~ OTA 
fo = aue — Upag — то — Фәй), (16) 
о АТА ^T^ ~ 
fe = aue — ubag — т(фьй — ue). (17) 


Note that (12) is a scalar equation written from the point of view of a given interface, 
whereas (13) and (15) are vector equations written from the viewpoint of an arbitrary 
element Q of size Л. 

It is now convenient to eliminate g and work with variables й and й alone. This 
can be done by solving (13) for 2 and substituting the resulting expression in both 
(12) and (15). The former substitution leads, after some algebra, to 


© Ө © 
my + т ~ Me ~ Ms ~ А А ^ ^ 
(s+ EM = rx - pu = фФЕВЕй” + PEBRA, (18) 


where Ре = |a|h/ denotes the Péclet number, for which a uniform mesh spacing 
is assumed. Moreover, four scalar constants ‘m’ have been introduced, defined as 


SATa- Ө ATi 6. AT 1-15 @ 4T 4-15 
ть=ФьМ $e. m5—65M Фо, mg—óó4M Фо, m$—ó65M фе. 


(19) 
In addition, the following matrices appear in (18) 
в MD rR B, М 
BL = CI ‚ B = >I- Р 20 
9 = 2 р Sho Pe E 


Note that (18) relates the solution vectors й of two adjacent elements (Qz and Әр) 
with the three interface states И associated to the boundaries of these elements. 

The second step consists in using 2 from (13) into (15), not forgetting to take the 
fluxes (16)-(17) into account. After some more algebra, one arrives at 


h ай n AS us X ui 
LM — + Ай = Авфейе + Aoóeuo , (21) 
2a dt 


whose matrices now introduced are given by 
А = 8 (8+ $8) +(2рРе-!/-гГ)р, (22) 


Ae = (6-1)1+2Pe!N, Ae = (8+1) 1-2Ре М, (23) 
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where 
8 = $245, e& =45%, N 2 (o$-o8-D)w-^. 04 


Note that (21) links the solution vector й and its time derivative to the two interface 
variables u at the boundaries of the considered element. 

In the actual context of simulations, (21) would be first solved (analytically) for 
u after an implicit time-stepping scheme is chosen. This is possible since it entails 
expressing dü/dt in terms of И at the current as well as previous time levels. The 
next step would be to insert the resulting expression for й into (18), from which а 
scalar equation whose only unknowns are и at various interfaces is obtained. This 
equation is finally used for the assembly of a global system given suitable boundary 
conditions, which can be solved via direct or iterative techniques. Since the system's 
solution grants й for all interfaces, й can be obtained locally for each element from 
the time-discrete version of (21). The reader is referred to [9] for the details of this 
procedure. In this work, however, as we are interested in the eigenanalysis of HDG, 
a different strategy is adopted, as outlined next. 


3 Temporal Eigenanalysis 


In the eigenanalysis of spectral element methods [2, 5], it is typical to assume wave- 
like solutions in the form й œ exp[i (xx — wt)], whereby a” = йехр(—кй) and 
ЙК = û ехр(+ікл). Here, й is the solution vector of a “central” element, whereas й^ 
and й“ refer to solution vectors of neighbouring elements from the left (L) and from 
the right (R), respectively. For the HDG formulation, an additional assumption can 
be made regarding a wave-like behaviour for й. We assume that ur = n exp(—ix'h) 
and TR = йехр(+ік'л), where now й is the interface variable shared by two 
adjacent elements, whereas ak and ms refer to interface variables at the nearest 
interfaces from the left/right (L/R). This second assumption is only natural given 
the connection between Zi and и. Actually, we now show that к’ = к, which is not 
surprising. 
We start from (21) assuming wave-like behaviour for й, obtaining 


h P ^ 


which uniquely defines й from бе and по. If the above is written for another 
element, say, the adjacent element from the right (a translation x œ> x + h), one 
has 


h А x 
(2м + 4) йехр(їкһ) = Аөфөйө ехр(ік'ћ) + Аофейо exp(ix'h) , 
a 
(26) 


Viscous Diffusion Effects in the Eigenanalysis of (Hybridisable) DG Methods 377 


which then implies 


„œh „ ехр@кй) Roy ЕС „wh А 
—-_М+А] й———— = Аффоио + Аофоие = \—1——М+А ји, 
2а exp(ik'h) 2a 


where (25) has been used on the right-hand side. Comparing the left- and right- 
most expressions above leads to exp(ik'h) = exp(ikh), which means к'Й = kh + 
2пл , for n integer. This phase ambiguity can be sorted out by the evaluation of the 
x-derivative of (25) at x + Л, given by 


h Р 
ik (Sent + А) йехракл) = ie (Aoba + Аофой ) ехр@к'®) , 
a 


which yields к’ = к. This last step about the phase is, however, not really 
necessary to the eigenanalysis because only the complex exponential factors appear 
throughout the relevant equations, hence knowing that exp(ix’/h) = exp(ixh) is 
sufficient. 

In the remainder of the study, orthonormal Legendre basis functions are assumed, 
whereby M — I. We note that numerical dispersion and diffusion eigencurves, 
which are the focus of the study, do not change depending on the basis functions 
adopted, provided that exact integrations are used in the spatial discretisation. 

In the temporal analysis, an eigenvalue problem is set where, given a real- 
valued wavenumber к, multiple (P + 1) eigenvalues of the relevant eigenmatrix 
are associated to admissible complex-valued numerical frequencies w = w(x). The 
procedure to obtain this eigenvalue problem is described below. 

We begin from (18), assuming ak = йехр(—ікћ) and ae = w exp(ixh), to find 


й = ($g Bg + dG BEA) o7 , (29) 
with scalar b = b(xh; Pe, B) defined as 
bz [ + тё — m2 exp(ikh) — mg exp Ciel] Pel. (30) 


Then, (29) is used into (21), relating the solution vector И at a given element to the 
state vectors of its left (^) and right (и) neighbours. From the wave-like behaviour 


of й and the relations й = й exp(—ixh) and й“ = й exp(+ixh), one can arrive at 


—imhü = Zi, (31) 
where w = w/a and matrix Z = Z(kh; Pe, В) is given by 
Z «2b Ae ®8 Bo exp(ikh) + Ag ®Z Ве exp(—ikh)+ 


(32) 
+АфФ® Вә + АоФЭВо — Ар], 
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in which ФӘ апа ФӘ are given by (24), whereas 


ФӘ = opl, ФӘ = фф. (33) 


In (31), we have the desired eigenvalue problem of size Р + 1, which thus supports 
this same number of eigenvalues A j. These are related to the (normalised) numerical 
frequencies cv; via 


ай = ix Z(h)) . (34) 


Typically, one of the eigenvalues represents the so-called primary eigenmode, while 
the remaining ones can be regarded as secondary as they simply replicate the 
behaviour of the primary mode on shifted wavenumber ranges. This formally allows 
us to focus on the analysis of the primary eigenmode and on its dispersion and 
diffusion eigencurves. The reader is referred to [2, 5] for the concepts relevant to 
the separation of primary and secondary modes adopted in this work. 

Once the primary mode is identified, the scheme's numerical diffusion behaviour 
can be assessed in wavenumber space through the imaginary part of cx, where 
the asterisk subscript denotes the primary mode from (34). Note that numerical 
diffusion is especially relevant to turbulence computations as it impacts not only 
accuracy, but also stability. Note that eigencurves are entirely defined by the 
polynomial order P, the upwinding parameter В and, in case viscosity is present, the 
normalised Péclet number Pe* = |а| ћ/и, with Л = h/(P +1). Standard upwinding 
is here assumed. 

Figure 2 depicts a comparison between HDG's primary dissipation curves for 
pure advection and for advection-diffusion at Pe* — 100 for P — 1,4 and 7. As 
explained further below, this is about the lowest value of Pe* one achieves (domain- 
wise) in a turbulent flow computation. However, at this Pe*, viscous effects are 
still somewhat weak in regular (linear-scale) plots of wif vs. kh, where c; is the 
absolute value of w’s imaginary part. This is especially true for Р < 4. Hence, 
Fig.2 also shows these plots in log-log scale, highlighting what happens at well- 
resolved wavenumbers. 

The log-log plots in Fig.2 are revealing. They make clear that HDG's numerical 
diffusion follows the correct diffusive behaviour up to a certain wavenumber, here- 
inafter named кс, beyond which upwind dissipation overcomes viscous diffusion. 
The exact diffusive behaviour, as derived from our model problem, is given by 


юй = (кп)? /Ре* ог log;o(c;A) = 210810(кй) — logio (Pe*) , (35) 


showing that, as Pe* increases, the reference line of exact diffusive behaviour shifts 
downwards, reducing the value of кей. Also, for a given number of DOFs, i.e. fixed 
h, increasing the discretisation order increases кс. This type of analysis reveals how 
upwind dissipation and viscous diffusion complement each other, allowing also 
for the estimation of the wavenumber к. after which upwinding dominates. The 
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Fig. 2 Normalised numerical diffusion in bilinear (left) and log-log plots (right) for P — 1, P —4 
and P — 7 (top to bottom), with/without viscosity (dashed/full curve), the former considering 
Pe* — 100. The exact diffusive behaviour is shown as a dotted parabola/line (left/right plots) 
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latter, though important for small-scale regularisation and stability, is not entirely 
physical in the sense of subgrid-scale modelling. Hence, ke values could be used 
as quality criteria for implicit LES / under-resolved DNS approaches based on 
discontinuous SEM. For transitional flows, where small numerical dissipation is 
particularly important, this kind of analysis might prove very useful. Although 
specific estimates would be needed for different schemes, the analysis strategy 
should be similar. 

Finally, it is now explained why Pe* — 100 is about the lowest Péclet value one 
may find in a turbulent flow simulation. As candidates for very small Pe*, one could 
think of the near-wall region of turbulent boundary layers, given the low velocity and 
small mesh spacing in typical wall-resolved LES. For the viscous sublayer, where 
ut < 5, the streamwise Peclét number can be evaluated using wall quantities: 


h + At 
polum Lo итд, =U 
у у 


tht, (36) 


where by definition v = u; бу, being ит the friction velocity and д, the associated 
viscous lengthscale. Our argument is then concluded since 50 < Axt = fit < 150 
in typical wall-resolved LES or under-resolved DNS approaches, cf. e.g. [10]. 


4 Concluding Remarks 


We presented a preliminary study of the numerical dispersion and diffusion 
characteristics of HDG methods for linear advection-diffusion problems using 
the temporal eigenanalysis technique. To the authors' knowledge, this is the first 
eigenanalysis of HDG methods, and also one of the first of such analyses of a 
discontinuous SEM to consider viscous diffusion effects, cf. also [11]. 

It was shown that, for the range of Péclet numbers encountered in under- 
resolved turbulence simulations, upwind (numerical) dissipation dominates viscous 
(physical) diffusion in the smallest resolved scales. Only in the large scales, 
the effect of viscous diffusion becomes significant. The wavenumber beyond 
which upwind dissipation overcomes viscous diffusion, and its dependence on the 
polynomial order, can be estimated through eigenanalysis, and this can be used as 
quality criterion for LES and DNS in general, and for implicit LES/under-resolved 
DNS in particular. 

Future work includes further analysing the interplay between viscous and upwind 
diffusion, investigating other numerical fluxes (e.g. over-upwinding B > 1, 
nearly central fluxes В = 0, non-zero viscous stabilization о = 0), and testing 
eigenanalysis against actual turbulence simulations. Finally, the dispersion-diffusion 
characteristics of HDG methods for spatially developing simulations could be 
investigated using spatial eigenanalysis techniques. 
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Spectral Galerkin Method for Solving (8) 
Helmholtz and Laplace Dirichlet Chente; 
Problems on Multiple Open Arcs 


Carlos Jerez-Hanckes and José Pinto 


1 Introduction 


We seek solutions of Helmholtz and Laplace equations in a two-dimensional plane 
after removing a finite collection of open finite curves—also called arcs. This 
setting can be found in areas such as structural and mechanical engineering [2], 
or biomedical imaging [11] to name a few. Such problems pose the following 
challenges: (1) unbounded domains, which call for boundary integral methods with 
carefully chosen radiation conditions; (2) singular behaviors of solutions near arc 
endpoints; and (3) large number of degrees of freedom when the wavenumber or 
number of arcs increase. 

Our approach is to recast the problem as a system of boundary integral equations 
defined on the arcs, so as to obtain an integral representation of the volume solution. 
Well-posedness for a single arc was proven in [9], with an extension to the multiple 
arcs case given in [5]. We will consider numerical approximations of the resulting 
surface densities based on Galerkin-Bubnov discretizations of the corresponding 
system of boundary integral equations. 

In the present note, we start by briefly introducing a spectral scheme to account 
for general arcs as well as for a wide wavenumber range. We show that significant 
reduction in both memory consumption and computational work can be achieved by 
an ad hoc matrix compression algorithm. Moreover, we establish detailed interde- 
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pendencies between compression parameters and accuracy. Numerical experiments 
validate our claims and point out further improvements. 


2 Continuous Model Problem 


Let the canonical domain (—1, 1) x (0) be denoted by Г. We say that g : roc 
is p-analytic if the function t + g(t, 0) can be extended to an analytic function 
on the Bernstein ellipse of parameter р > 1 (cf. [10, Chapter 8]). We say that 
A с В2 ва regular Jordan arc of class С", for m Е N, if it is the image of a 
bijective parametrization, denoted by г = (71, 72), such that its components are 
C" (T)-functions, г: Г — Лапа (012 > 0,Yt € Г, where 1-15 is the 
Euclidean norm. Similarly, we define p-analytic arcs as those whose components 
are p-analytic. Throughout, we will assume that for any A regular Jordan arc, there 
exists an extension of A to А, which is a closed and keep the same regularity. 

Consider a finite number M € N of at least C larcs, written {г; М. 1 Such that 
their closures are mutually disjoint. Moreover, we assume that there are disjoint 
domains Q; whose boundaries are given by extensions 0Q; = Г,, fori = 1,..., M. 
Let us define 


M 
Pe |r; and Q:=R?*\T. 


We say that Г is of class C", т є N, if each arc Г; is of class C" and analogously for 
the p-analytic case. Fori € {1,..., М}, letr; : Ds Г; and gi : T; > C. We claim 
that g = (g1,..., gm) is of class C" (T) if gi or; Е c" (T), fori {1,..., M. А 
similar definition holds for the analytic case. 

Let С с ВЯ, d = 1,2, bean open domain. For s є R, we denote by H*(G) 
the standard Sobolev spaces, by Н} (С) their locally integrable counterparts [8, 
Section 2.3], and by H- (G) the corresponding dual spaces. The corresponding 
duality product (when the dual space of L? (G) is identified with itself) is denoted 


(-, 2c. Finally, Ну (С) refers to mean-zero spaces [5, Section 2.3]. We will also 


make use of the following Hilbert space іп R?: 


U (x) 
y 1+ 1х logQ + |5) 


where D*(G) is the dual space of C°(G) = Nys1C"(G). For s Е R and for the 
finite union of disjoint open arcs Г, we define Cartesian product spaces as 


W(G):21U Е D* (G): er'(6,VUer(, 


Н°(Г) := H?(T1) x НГ) x -- x HS (Ty). 
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Spaces HS (Г) and Hy, (Г) are defined similarly. Also, IH? (Г) is to be understood as 


the Cartesian product I H? (Г). Finally, given an open bounded neighborhood 
С; such that Г; C 0G;, Dirichlet traces are defined as extensions to H*(G;), for 
5 > 1/2, of the following operator (applied to smooth functions): 


Yi 


uy) := шу + en; (у)), 


where п; (у) is the unitary vector with direction (7:0), m 1 (D) and г such that 
r(f) — y. For a function и defined in an open neighborhood of Г; such that уги = 
у; и, we denote ури := уги. 


Problem 1 (Volume Problem) Let є Н? (Г) апак > 0. We seek И є BL AS) 
such that 

AI ^U =0 in Q, (1) 

y; U = gi fori = 1,..., М, (2) 

Condition at infinity(«). (3) 


The behavior at infinity (3) depends on к in the following way: if x > 0, we employ 

the classical Sommerfeld condition [8, Section 3.9]. If « = 0, we seek for solutions 

U e W(Q). This last condition was discussed in detail in [5, Remarks 3.9, 4.2 and 

4.5] with uniqueness proofs for x > 0 provided in [5, Propositions 3.8 and 3.10]. 
For x > 0, we can express U solution of Problem | as 


M 


U(x) = У (SLE), VxeQ., (4) 


ї=1 


where 


(SLi[«]4;) Q9 := | Ск(х, у) (у)аГ; (у), Vxe&, 


i 


denotes the single layer potential generated at a curve Г; with G, the corresponding 
fundamental solution, defined as in [8, Section 3.1]. It is direct from (4) that U 
solves (1)-(2) in Q (see [8, Theorem 3.1.1]). Also, it displays the desired behavior 
at infinity as long as each å; lies in the right functional space [5, Section 4]. In order 
to find the surface densities A;, we take Dirichlet traces уг of the SL; and impose 
boundary conditions (2). This naturally defines of weakly singular boundary integral 
operators: 


1 


[к] = 5 


(vt SLjlk]+ y7 SLylel) = иЗЕЛа, 


and an equivalent boundary integral equation problem to Problem 1. 
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Problem 2 (Boundary Integral Problem) Let Е H?(T). For x > 0, we seek 
X = (А,...,Ам) € Ñ? (T) such that 


[кА = в, 


where [к]: fl-2 (Г) > H? (Г) is а matrix operator with entries Z[x];; = Аи [к], 


ix mud 
fori, j € (1,... Mj. Ик = 0, we seek А є Ho (T), given g in the dual space of 
the aforementioned space. 


Theorem 1 (Theorem 4.13 in [5]) For « > 0, Problem 2 has a unique solution 


s ле 
Ає #-> (Г), whereas for к = 0 a unique solution exists т the subspace Но (Г). 
Also, the following continuity estimate holds 


Al < C(I, . 
т) S СО. Olg gc 


3 Spectral Discretization 


We present a family of finite dimensional subspaces in ñ- (T) that can be used 
to approximate the solution of Problem 2 (cf. [4, 6]). Let Ty (T) denote the space 
spanned by first kind Chebyshev polynomials, denoted by (T; ss of degree lower 
or equal than N on P, ortho gonal with the L?(—1, 1) inner product, under the weight 
ш! with w(t) := УТ — 12. Now, let us construct elements рі = T, ex. over each 
arc Г; spanning the space Ty (I';). For practical reasons, we define the normalized 
space: 


Tw) = {р ecco: pi: pi, € Ty (Tj) 


ron 


а 8 
о 

M 
>] 


We account for edge singularities by multiplying the basis { B S by a suitable 
weight: 


Qnty з= {ai = wr! P: В e Two]. 
wherein и; := шо г The corresponding basis for Ом(Г;) will be denoted 


(al) ae By Chebyshev orthogonality, we can easily define the mean-zero subspace 
Ох, (о (Ti) := Qn(T;) \ Фо(Г:), spanned by (q]) ,. With these definitions, we 
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set the discretization space for a Galerkin-Bubnov solution of Problem 2 as 


А П“, Омо (ГР fork = 0, 
Нм[к] := m 
Пн. Qu (T) fork > 0. 


Problem 3 (Linear System) For x > 0, let N є Nandg є H? (T) be the same as 
in Problem 2. Then, we seek coefficients u = (и1,..., um) € C!MW+D) such that 


Цік]и = д. 


Therein, we have defined the Galerkin matrix Ы к] є CMN+1)xM(N+1) composed 
of matrix blocks L;;[x] € CN+Dx(N+) whose entries are 


(кь = (Lybia 4i). = n [«ш "Ти, wT). 


There, 2, j[«] is the weakly-singular operator whose kernel is parametrized by rj, г; 
and right-hand 9 = (g),..., 9) € CMD) with components 


(81 = (si. ai), = (a. vr). 


where 2; = g; o rj. The approximation Ам € Нм[к] is constructed as 


N 
(Ам); = Con inT;, foralli € {1,..., M). 


m=0 


Fork = 0 we need g as in Problem 2; we also have и € СММ, and ЦО] e CMNxMN 
since the approximation space is Hy[0]. By conformity and density of these spaces 


in Я > (Г), one derives the following result: 
Theorem 2 (Theorem 4.23 [4] Гек > 0, т e Nwithm > 2, Г є С", ge 
C" (T), and X be the only solution of Problem 2. Then, there exists No € N such 


that for every N > № € N there is a unique Ам € Нм[ к] solution of Problem 3. 
Moreover, the following error convergence rates hold 


ЕЕ —т-1 
А „Рей < С(Г,к)М | 


Moreover, if T and g are p-analytic with р > 1, we have the following super- 
algebraic convergence rates 


А = Aula , СО. юр 7 VIN, 


1 
(Г 


where C(I, к) is a positive constant, which does not depend on N. 
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Remark 1 Observe that the constants С(Г, к) and No depend on the geometry and 
frequency. To the best of our knowledge previous convergence results for 2D arcs 
are somehow limited. For intervals, the result was established in [6] whereas for 
more general arc results are only obtained for the Laplace case [1]. Super-algebraic 
convergence rates can be achieved by the method detailed in [3], though their 
scheme is limited to intervals and to the case of elliptic problems (№ = 0). More 
complex cases are still an open problem. 


4 Numerical Implementation and Compression Algorithm 


Before fleshing out our proposed compression technique, we explain how Ц к] and 
g of Problem 3 are computed. For the right-hand side, one must compute integrals 
of the form: 


1 
[ Pow ono. УІ є №, 
—1 


which corresponds to Fourier-Chebyshev coefficients of g(t) and can be approxi- 
mated using the Fast Fourier Transform [10]. Computations for matrix terms L;; [x] 
are split into two groups: (a) cross-interactions, where test and trial functions 
supports lie along curves Г;, Г; with i # j; and (b) self-interactions, where both 
trial and test functions are defined on the same curve. As for cross-interactions the 
integral kernel is smooth, we use the same computational procedure for the right- 
hand side. 

For self-interactions, the kernel function has a singularity that can be character- 
ized as 


Gx(r(t), г(5)) = Qz) Пюви — 511 (г) — г) 15) + 6,(,5), 155, 


for t,s € f, where Jo is the zeroth-order first kind Bessel function, and G, is 
a regular function. Thus, integration for the regular part is done as in the cross- 
interaction case, while integrals with the first term as kernel are obtained by 
convolution as integrals for log |t — s| are known (see [6, Remark 4.2]). 

Yet, as к increases, larger values of N will be required, and thus, the need 
to compress the resulting matrix terms. As stated in [10, Chapters 7 and 8], the 
regularity of a function controls the decay of its Fourier-Chebyshev coefficients. 
Hence, as the entries of the matrix Цік ] are precisely such coefficients, for a smooth 
kernel one observes fast decaying terms. This implies that we can select small 
blocks to approximate the matrix and obtain a sparse approximation by discarding 
the remaining entries, based on a predetermined tolerance e > 0. Specifically, 
the kernel function is smooth when we compute cross-interactions. Let the routine 
Quadrature(/,m) compute the term (/, m) of this interaction matrix using a 2D 
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Gauss-Chebyshev quadrature. Given a tolerance є > 0, we minimize the number of 
computations needed by performing the following binary search: 


Matrix Compression Algorithm 


INPUT: Tolerance (Tol), Max level of search (Lmax) 
OUTPUT: Number of columns to use (Ncols) 
INITIALIZE: Ncols = N, level = 0, а = 0, b - N 
While{level < Lmax} 

m = (a+b) /2 

Tleft = m-1 

Tcenter = m 


Tright = m+1 
Veft = abs (Quadrature (0,Teft)) 
Vcenter = abs (Quadrature (0,Tcenter)) 


Vright = abs (Quadrature (0,Tright)) 

If{Vright & Vcenter < 0.5«Tol) or {Vleft & Vcenter < 0.5*Tol} 
b = mi 

Else 

а = м 

EndIF 
1еуе1 ++ 

EndWhile 

Ncols = b 


The algorithm returns the minimum number of columns required, Ncols, by 
searching in the first row the minimum index such that the matrix entries' absolute 
value is lower than є. The binary search is restricted to a depth Lmax € N. The 
same procedure is used to estimate the number of rows, N,5,,, by executing a 
binary search in the first column. Once Neojs and N;ows are selected, we define 
Ne := max{N;ows, Ncols} and compute the block of size № x № as in the full 
matrix implementation. 

The matrix compression percentage will strongly depend on the regularity of the 
arcs involved. For p-analytic arcs, using [10, Theorem 8.1] we can prove the lower 
bound: 


№ > зас 
2Y log p 


where Y is an upper bound for the absolute value of the kernel in the corresponding 
Bernstein ellipse. However, since compression is done by a binary search, the bound 
for the compression rate depends on Lmax as 
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Compression of self-interaction blocks does not follow the same ideas. In fact, these 
blocks can be characterized as two perturbations over the canonical case, Г = Г for 
к = 0, leading to a diagonal matrix. Namely, these are 


1. A low frequency perturbation caused by the mapping 7; : Г TP, similar to the 
cross-interaction case. 
2. A frequency perturbation that creates banded matrices. 


In order to reduce memory consumption—though not computational time—we 
discard the entries of the self-interaction matrices lower than the given tolerance. 

As expected, matrix compression induces an extra error as it perturbs the original 
linear system solved by Aw in Problem 3. We denote Бу L, [A] the matrix generated 
by the compression algorithm with tolerance e, and define the matrix difference 
АА := Ш — L[k]. We seek to control the solution и = u + Ди of 


(ША + АЕА) = 9, 


where и апа g are the same as in Problem 3. In order to bound this error, we will 
assume that, for every pair of indices (i, j) in the matrix L[k], we have, 


(АЕК: < e. (5) 


Theorem 3 Let N € N be such there is only one Ам solution of Problem 3. Then, 
there is a constant С(Г, к) > 0, not depending оп М, such that 


SPI Ne 
П ^ |C, Г) — Ne 


Proof By [7, Section 1.13.2] we have that 


Aull _ [ALe]; 


lulls = [EKD], — [АЫ 


and thus, we need to estimate || AL, [k] |, апа | к! R The bound for the first 


term is direct from (5) and matrix norm definitions. By the classical bound of a 
matrix inverse and the continuity of the associated boundary integral operator, it 
holds that 


[Le'o], = в" = се, г), 


from where the result follows directly. o 
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We can also estimate the error introduced by the compression algorithm in terms of 
the energy norm. In order to do so, define (AÑ); :— X а. in Г;. By the 
same arguments in the above proof, we obtain 


є№3/2 


— E Cale PV_eNnN 
[Ам — X] нт)? Cj, T) EN” 


acta = Cık, Г) || 


where g is the same that in Problem 2 and Ст(к, Г), С(к, Г) are two different 
constants. 


Remark 2 Our compression algorithm produces a faster and less memory demand- 
ing implementation of the spectral Galerkin method at the cost of accuracy loss, 
similar to fast multipole or hierarchical matrices methods. Moreover, once we have 
compressed the matrix, we can implement a fast matrixvector product. 


5 Numerical Results 


To illustrate the above claims, Fig.1 presents convergence results for different 
wavenumbers, к = 0, 25, 50, 100 for a configuration of М = 28 arcs. As the 
chosen geometry and excitation are given by analytic functions, Theorem 2 predicts 
exponential rate of convergence as observed numerically. 


Table 1 provides matrix compression results for к = 100 and for the same 


geometry of Fig. 1. It presents the percentage of non-zero entries (%NNZ) and 
relative errors as bounded in Theorem 3 as functions of the maximum level of binary 


5 

© 
Ф 

- 
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(a) Geometry (b) Convergence Hr? (Г)-погт 


Fig. 1 (a) Smooth geometry with М = 28 open arcs parametrized as г; (f) = (ait, c; sin(b;t)--di), 
with а; € [0.14, 0.25], b; € [0, 0.2], с; € [1,2], d; Е [0, 20], г € [—1, 1]. (b) Convergence results 


for different wavenumbers and a planewave excitation along (1, 1). Errors computed against an 
overkill solution using N — 660 per arc 
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Table 1 Compression performance for к = 100 


eum 
Order мм [Rel error 
€ = 1е—6 
5 65.24 5.05е—01 
10 81.62 5.32е—01 
20 2.33е—01 
40 67.11 9.10е—04 
60 33.36 3.31e-07 
80 19.50 3.35e-07 
€ = 1е—10 
5 65.29 5.05е—01 
10 81.68 5.32e—01 
20 89.44 2.33е—01 
40 7628 — [9109-84 
60 40.70 3.89e—09 
m 17:10 


search (Lmax), tolerances (є), and polynomial order per arc (Order). For low orders 
(Order < 60), relative errors are quite large, and therefore, most of the matrix terms 
are kept. This is due to an insufficient number of matrix entries to solve the problem 
with good accuracy (see Fig. 1), rendering compression pointless. On the other hand, 
once convergence is achieved, the compression error drastically decreases along 
with the percentage of matrix terms stored. 


References 


1. Atkinson, K.E., Sloan, I.H.: The numerical solution of first-kind logarithmic-kernel integral 
equations on smooth open arcs. Math. Comput. 56(193), 119-139 (1991) 

2. Costabel, M., Dauge, M.: Crack singularities for general elliptic systems. Math. Nachr. 235(1), 
29-49 (2002) 

3. Hewett, D.P., Langdon, S., Chandler-Wilde, S.N.: A frequency-independent boundary element 
method for scattering by two-dimensional screens and apertures. IMA J. Numer. Anal. 35(4), 
1698-1728 (2014) 

4. Jerez-Hanckes, C., Pinto, J.: High-order Galerkin method for Helmholtz and Laplace problems 
on multiple open arcs. Technical Report 2018-49, Seminar for Applied Mathematics, ETH 
Zürich (2018) 

5. Jerez-Hanckes, C., Pinto, J.: Well-posedness of Helmholtz and Laplace problems in unbounded 
domains with multiple screens. Technical Report 2018-45, Seminar for Applied Mathematics, 
ETH Zürich (2018) 

6. Jerez-Hanckes, C., Nicaise, S., Urzáa-Torres, C.: Fast spectral Galerkin method for logarithmic 
singular equations on a segment. J. Comput. Math. 36(1), 128-158 (2018) 


Spectral Galerkin Method for Dirichlet Problems on Multiple Open Arcs 393 


7. Saad, У.: Iterative Methods for Sparse Linear Systems. Computer Science Series. PWS 
Publishing Company, Boston (1996) 
8. Sauter, S., Schwab, С.: Boundary Element Methods. Springer Series in Computational 
Mathematics. Springer, Berlin (2010) 
9. Stephan, Е.Р.: A boundary integral equation method for three-dimensional crack problems in 
elasticity. Math. Methods Appl. Sci. 8(4), 609-623 (1986) 
10. Trefethen, L.: Approximation Theory and Approximation Practice. Other Titles in Applied 
Mathematics. SIAM, Philadelphia (2013) 
11. Verrall, G., Slavotinek, J., Barnes, P., Fon, G., Spriggins, A.: Clinical risk factors for hamstring 
muscle strain injury: a prospective study with correlation of injury by magnetic resonance 
imaging. Br. J. Sports Med. 35(6), 435—439 (2001) 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons licence and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter's Creative 
Commons licence, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter's Creative Commons licence and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Explicit Polynomial Trefftz-DG Method A 
for Space-Time Elasto-Acoustics PEE 


H. Barucq, H. Calandra, J. Diaz, and E. Shishenina 


1 Trefftz-DG Formulation for the Elasto-Acoustic Equation 


Trefftz methods are particular finite element methods where the basis and test 
functions are locally solutions to the partial differential equation that governs the 
problem to be solved. Compared to the existing literature for solving frequency 
problems, space-time Trefftz methods are still not widely used. One reason could be 
that they require using space-time meshes [6, 12]. To our knowledge, few references 
on Trefftz approximations of time-dependent wave equations are available and they 
mainly address theoretical properties in the case of Acoustics and Electromagnetism 
[4, 8, 10, 11]. They provide convergence and stability studies and some numerical 
results are displayed by using plane wave bases in 1D 4- time dimension. Numerical 
in 2D+time dimensions are proposed in [4] for electromagnetism. There are 
also some studies devoted to the second-order formulation of the acoustic wave 
equation approximated in Trefftz spaces by the mean of Lagrange multipliers [1, 13]. 
In [3], we have proposed a Trefftz-DG formulation for elasto-acoustic. The method 
required the inversion of a huge sparse matrix. The goal of this paper is to Show how 
to derive a semi-explicit scheme, requiring only the inversion of a block-diagonal 
matrix on each element of the mesh. 
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In this section, following [10] and the framework therein, we propose a formula- 
tion of the elasto-acoustic coupling reading as a first-order system. Here and further 
the sub-scripts F and S corresponds to the acoustic (fluid) and elastodynamic (solid) 
domains. 


1.1 Elasto-Acoustic Equations 


We introduce a space-time domain О = (Qr U Qs) x I, where Ок C Е is a 
bounded Lipschitz domain of dimension d filled with fluid, Rs C В“ is a bounded 
Lipschitz elastodynamic domain of dimension d filled with solid, and J = [0, T] is 
the time interval. All medium parameters ср = cr(x) and ок = pr(Xx), standing 
for the acoustic wave propagation velocity and fluid density respectively, as well 
as the inverted stiffness tensor C^!) = A(x) and the solid density ps = ps(x), 
are assumed to be piecewise constant and positive. We denote by Гру = Ор П 
525 the fluid-solid interface. The elasto-acoustic system of equations is based on 
the coupling of the first-order acoustic equation, written in terms of velocity ур = 
Vr (X, t) and pressure р = р(х, t) fields: 


1 op . р 
5 — + Шуур = f in Ор, 
crpr Ot 
дур ; 
prom ped in Qr, (1) 
УЕ(-, 0) = vro. pC, 0) = po in Qr, 
VF ‘No; = 8r in 9QQ EN rs x I, 


where по; is the normal vector to 952 р, the source term f = f (x, t), the boundary 
condition gr, the velocity уро and the pressure ро are the initial data, with the first- 
order elastodynamic system, written in terms of velocity vs = Ус (х, f) and stress 
tensor (symmetrical and positive) ø = o (x, t) fields: 


à 
ALS. — evs) 20 in Qs, 
— Qt 
dvs | | 
ps, — dive =0 in Qs. Q) 
у5(, 0) = vso, 0(,0) = oo in Qs, 


опо; = gs in 025 VP rs x I, 
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where ng, is the normal vector to 9525, the boundary condition gr, the velocity vso 
and the stress tensor o are the initial data. The transmission conditions between 
the two systems (2) and (1) represent the continuity of velocity and stress normal 
components ГЕ: 


УЕ Пг; =Vs агр; atl rs, (3) 


= Pirrs = ОПГ; at Гру. 


The velocities aligned with the interface and the tangential stress remain uncon- 
strained. 


1.2 Space-Time Trefftz-DG Formulation 


We introduce a non-overlapping space-time mesh 7} оп О composed of space-time 
Lipschitz elements Кр C Qr x I and Kg C 95 x I. We denote by У рр (resp. 
Tsn) the restriction of Th to the fluid (resp. solid) domain. Let ng, = Mk, ; nkp) 
be the outward-pointing unit normal vector on д Кр, and пк; = (nk, пк.) Бе 
the outward-pointing unit normal vector on д Кс. We assume that all medium 
parameters are constant in Kr and Ку respectively. The mesh skeleton 7; = 
U 8Kr,scan be decomposed into families of the internal Fe faces, the fluid- 
Kr.s¢Th 
solid Уз faces, the boundary FP faces, the initial and final time F? and E 
element faces respectively, as it shown in Fig. 1. We introduce the space У» (77) 
as a subspace of L^(Q) defined by V,(75) = (6 є 1200), фік; € PP(Kr.s)]. 
The unknowns (Ури, pn, Уѕл, ©) are supposed to be in Vi(75) = Ул (Trn)4 x 


У, (кр) x Vn(7T 54 x У, (Ts). We consider the test functions wr, q, ws, & іп 


Fig. 1 Example of yer 
1D + time mesh Th covering ; 7 8a = 
Q. The internal element faces 
Fe are represented by dotted 
line, the element faces of 
fluid-solid interface 77 5—by 
dash-dotted line, the d pue | 
boundary element faces 2E C р 

FP —by thick line, the initial ыз. DR) 

я and the final $1 time 0 = = 

element faces—by double Ор Os 
and dashed line respectively 


time / 


space domain 
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ТО») for vr, p, vs ando respectively, where the Trefftz space T(7 5) is defined оп 
the mesh 77, as follows: 


ә 9 
TT) = (ор, q, es. E) € Vh (Th) 5.1. —5— E + diver = 0, pp +У4=0 
= CE PF ot ot 


ag dws : 
УКЕ € Try, and A—— — e(os) =0, о5 5 — divg = 0 VKs € 751}. 


This space is of Trefftz type since it is a subspace of the regular space Ул (у) 
composed of local solutions of the volumic governing equations (1) and (2) set in 
each element K r and K s respectively. 

As in the standard DG methods, the next step in order to obtain the variational 
formulation consists in multiplying the equations of (1) by the test functions q and 
wr in T(77;), and the equations of (2) by the test functions & and ws in T(77) 
respectively, and, as is standard in space-time DG methods, we integrate by parts 
the obtained equations not only in space but also in time: 


: p V Ad ^ Y 
xj | 2 Prank, + ЧЯкь Mk, + pr rn ®Е n, + Pror My, |ds+ 
Krag, FPF 

F 


у, f [Ag тк. — #55 Nk, + 05 Vsn -ØS nks —6,: (os 8 ny) |ds = 


№. | fadv. (4) 


Thanks to the choice of test functions the left hand side of the above space-time 
formulation contains only surface integrals. The numerical fluxes in time V rj, Ph, 
Ул, 6, and in space Ven, Ph, Vsn, © are defined in the standard DG notations 
[2, 3, 7] as follows: 


Ven Nx, Vsh c Nk, + 51 (пк, + Pang) Пк, 

Ph |р + о (ур пк. — Vsn Пу) ЕРЕ. 
Vsh — | Ysa = 61 (6,0, + punt.) №? 
бтк, — рп, + o (УРА Пк, — Vsh ` Dk, Mk 


Ven Mx, \ _ [бк 
Рь Ph + а1(уғһ Nk, — 8Е)] 


Vsn _ [Узп = (вк, — gs) on FP 
и = ae 
брк; 85 | 
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Ven\ _ (Vra + Fillon Vsn) (0500—0100, по 
Pho]  \{ры- аук ] ên) ME- у УЕ di 
УЕ NEL + olv ral: Vsh _ (Esn МУР ) on Fe 
Ph {ри} + balp J’ б, {о} + 5210,1, ii 
Ven\ _ (Ven Vsn\ _ [ Vsn on #Ї 
Ph Ph і 9, 9, | 
Ув) _ (5 — a2)v rh + (3 + @2)VFo 
Pn G — Bo) pn + G + B2) po 

Vsn\) — (G = vnvsn + ($ + у2)у50 

[: ) = (4 — 322, + @ +8260 a 


Here, a1, a2, Ві, В2, 61, 82, ут, and у are positive penalty parameters. As in 
standard DG methods, a suitable choice of these penalty parameters allows one to 
prove stability of the overall method. It is shown in [2, 3] that they contribute to 
the accuracy and convergence of the numerical method. We refer to [2, 3] for more 
details on the definition of the numerical fluxes. 

Summing the contribution (4) of all elements Kr, Ks € 77, and introducing 
the bilinear Ятрс € ; ·) and the linear £r pg (2 forms for the left-hand side and the 
right-hand side expressions respectively, we obtain the Trefftz-DG formulation for 
the elasto-acoustic problem: 


Seek (Vrn, Ph, Узи, ©) € T(7 5) such that, for all (or, q, es, &) Е T(75) 
it holds true: Е 


Ятрос ((уғп, Ph, уһ, ©); (юк, q, ө5, 8)) = (трс(өғ, q, øs, £). 


(5) 


The analysis of well-posedness of (5) is based оп the coercivity and continuity 
estimates of the bilinear and linear forms in mesh-dependent norms [2, 3]. The proof 
is similar to the one given in [10] where the acoustic wave equation is addressed. In 
Sect. 2 we provide the algorithm of the Trefftz-DG formulation (5), and we discuss 
different analytical and numerical approaches for its optimization. 


2 Implementation of the Algorithm 


The numerical implementation of the Trefftz-DG formulation is different from the 
standard DG ones which address the space and time integration separately. Standard 
DG space integrations have the interesting feature of leading to a block-diagonal 
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mass matrix and allow then the use of explicit time integration. The computational 
costs thus depend on a CFL condition which sets the value of the time step as a 
function of the space step. On the other hand, a naive implementation of Trefftz- 
DG methods require performing a space-time integration which leads to invert a 
sparse matrix whose size tends to be huge. It is thus not obvious that a crude 
implementation of the Trefftz-DG algorithm does not generate additional cost as 
compared to standard DG ones. 

In this section we provide some important steps of implementation of Trefftz-DG 
formulation (5) and discuss optimization techniques. The complete algorithm with 
more numerical details can be found in [2, 3]. 


2.1 Change-Over Between the Time Slabs 


To simplify the presentation, we assume here that we use the same order of 
approximation on each cells, so that we have № j of degrees of freedom on fluid 
cells and № of degrees of freedom on solid cells. Once we have defined the discrete 
approximation space, we can solve the problem inside each element Kp and Ks, 
communicating the corresponding values at the boundaries д Кр and д Ks Бу the 
incoming and outgoing fluxes. Thus, the variational problem is represented by a 
algebraic linear system, with a sparse matrix M, of size equals to the total number 
of elements № Ua multiplied by the number of degrees of freedom per element N j бүз 


that is № B x N dos + № х № of When compared to the computational cost of 
standard DG implementation, the corresponding Trefftz-DG cost is thus increased 
and it is mainly due to the need of inverting the large-sized matrix. The most obvious 
way to reduce the size of the matrix, which is classically used in most work on 
space-time Trefftz method, is to consider time slabs. We restrict ourselves to the 
case of cartesian meshes, but this methodology can also be applied to unstructured 
meshed. An alternative is to use tent-pitched meshes that respect the causality, 
this will be the topic of a future work. In order to optimize the execution of the 
algorithm, we propose to divide the space-time domain Q into N; elementary time 
slabs Q1, Q2,..., Ом, and to solve the problem slab by slab, considering the final 
results, computed in the current time slab at time f, as initial values for the next 
slab at time t + At (see Fig. 2). Thus the size of matrix inside each time slab is №; 


Fig. 2 Example of 1D + time EO | T = МЕ А! 
mesh 7, оп О decomposed Ri hon i 
into N; time slabs ur | 
» и. Ə ДЫН Т = 2л 
= Kr Ке 
| to = At 
zr cu MS а es ee T=At 
1 
L tg = 0 
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times smaller, compared to the initial one. Moreover, if the medium parameters are 
fixed in time, and the space discretization is preserved from slab to slab, the matrix 
can be computed and inverted once, and then re-projected onto the next time slabs, 
reducing thus the global numerical cost. 


2.2 Polynomial Basis 


One of the important advantages of Trefftz type methods is the flexibility in the 
choice of basis functions provided they satisfy the Trefftz property locally in each 
element. To perform the numerical simulations, we have extended the algorithm 
proposed by Maciag in [9] for computing wave polynomials, solutions of the second 
order transient wave equation, to the first order acoustic and elastodynamic systems 
of dimension one and higher. It consists in computing a polynomial basis, defined in 
the reference element, using Taylor expansions of generating exponential functions 
which are local solutions of the initial system of equations. An example of space- 
time wave polynomial basis for the first-order acoustic wave equation reads as 
follows (approximation degree р=3, dimension of the physical space d = 1): 


$i =0 фу =1 фу =x фу —crt 

2р Ар Ар ‚2 Ар 

фу = —cr ф = 0 фз = сі $4 = —crx 

Е 2 5 su " 3 2 42 X 3 2 

Д-Р peace #2 gy = SP нн 
2,2 3,3 12:2. 

^p p `p 2 cit ^p crt crt, AP 23 єстї 

$5 = срхі фе = ск(® 75-) фу = cr 4+ 5) фр = crl + 2 ) 


This basis contains the couples of polynomial functions (Ф, фР), corresponding 
to the velocity and pressure respectively, which are locally defined and satisfy the 
Trefftz property inside each element of the mesh, and of degrees less or equal to p 
(p — 0,1,2,3) to provide an approximation of order p. By their construction, the 
Trefftz basis functions are not attached to the coordinates of the degrees of freedom 
inside the element, contrary to the Lagrange polynomials. Even if we compute only 
surface integrals, we can evaluate the final approximation solution in any point of 
the element refinement. We refer to [2] for more numerical details as well as for the 
acoustic and elastodynamic basis examples of higher dimensions. 


2.3 Inversion of the Matrix M Inside a Time Slab 


The inversion of the matrix inside the time slab can be explicitly reduced to the 
inversion of its block-diagonal component, which corresponds to the integration at 
the bottom and top of the time slab (initial and final time faces T. and $1), thanks 
to the Taylor expansion formulas. More precisely, let us recall the expression for the 
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bilinear form Ar pg С; -) from Sect. 1.2: 


motd Је Је рер. 


UN h 


a om ome 


— o 
Uu Ar pg 


It consists of A? sts -), that corresponds to the integration at the initial and 
final time element faces of the time slab, and AL рс <), that corresponds to the 
integration at the internal, boundary and fluid-solid element faces. Thus, the matrix 
М can be represented by the sum of two matrices Ag Мо and A; Mj corresponding 
to AF pg and AL рс respectively, as follows: 


M = ^oMQqg + АМ]. 


Here, Ло « (Ax)d represents the area of the local faces in F? and FE, and A; « 
(Ах)! At represents the area of the local faces in FE, FD and go respectively. 
We refer to [2] for more details. 

This decomposition is of particular interest since Мо is block-diagonal, each 
block corresponding to one element. Indeed, we have: 


^ 
AgMg + АМ = (дема) (1+ A Mg Mr) = (^aMa)(1- кР), 
Q 


Here Iis the identity matrix, к = AL « A and P = Mg IM I- 
If |КР is sufficiently small, we can apply the Мааша formula in order to 


obtain the polynomial expansion for M7! as follows: 


M^! = (1+кР) '(AgMe) =(Le 1)"«" P")(AgMa) | 


This representation reduces the inversion of the sparse matrix M to the inversion 
of its block-diagonal component Mo and the multiplication of the inverted block- 
diagonal Мо by the sparse Му. It provides an explicit way for solving the initial 
linear system approximately. Even though it requires a CFL—type condition related 
to value of ||« P||, justifying the approximate solution of the system, it significantly 
accelerates the algorithm execution. 

In Table 1 we compare the numerical accuracy (L?-norm in time and space 
of numerical error as a function of cell size Ax) of the TDG method in a 2D 
homogeneous acoustic case for both the exact and approximate matrix inversions 
as a function of the mesh size and of the number n of terms in the Taylor expansion. 
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Table 1 Accuracy (L?-error in space and in time) of the solution when using the approximate 
inversion with n — 3, 4, 5 and the exact inversion 


1.4166e—05 4.3741e—05 2.8780e—04 2.5772е—03 
3.1623е—07 1.2656е—06 5.3868е—05 1.2674е—03 


2.8903е—07 9.1744е—07 4.1029е—05 


The exact inversion (к = 10-2) 


2.2540е—07 8.9583e—07 5.5811e—05 1.3004e—03 


The accelerating factor is the ratio of the computational costs of the two methods for reaching the 
same accuracy 


3 Numerical Tests 


For the numerical implementation of the Trefftz-DG method we have considered a 
2D medium composed of two homogeneous rectangular layers: the acoustic one and 
the elastodynamic one. We have set a source term at the fluid-solid interface, and two 
receivers in the acoustic layer and in the elastodynamic one. The numerical signals 
at both receivers have been validated with the analytical solutions computed with 
Сагбтоге code [5]. In Fig.3 we show the convergence of the numerical velocity 
as a function of cell size for different degrees of approximation (p — 0, 1, 2, 3) 
computed at receivers in (a) 2D acoustic layer and (b) 2D elastodynamic layer. 
In each case, the convergence rate is higher than the corresponding approximation 
degree. We refer to [2], where we provide more examples. 


1 
10 10! 


107! . 10-1} + 


5 . 5 
Ө 5 : Р 
8 10-3} * , 5 10-3} * 
9 "s | 

5 * p=0 (slope=0.5) 10-5 м * p=0 (slope—0.4) 
T 1 (slope=1.8) 

* p-1 (slope=1.4) * p-1 (slope=1. 
*  p-2 (slope-2.9) * p=2 (slope=2.6) 
*- p=3 (slope=4.1) = *  p=3 (slope=4.1) 

: 1077 


1074 


10715 10714 1071? 107! 
Ar 


(a) (b) 


Fig. 3 Convergence of numerical velocity in function of cell size Ax. (a) 2D acoustic layer. (b) 
2D elastodynamic layer 
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4 Conclusion 


The Trefftz-DG methodology for solving the first order elasto-acoustic system has 
demonstrated the important advantages, such as the use of degrees of freedom 
evaluated at the element faces only, the flexibility in the choice of the basis functions 
and the unconditional stability. However, in its initial form, it still shows some 
limitations due to the space-time integration that leads to the representation of 
the discrete system by a huge sparse matrix whose straightforward inversion is 
very expensive, even when using time slabs. We find ourselves in a situation of 
using an implicit scheme for solving the forward problem that risks to overload the 
iterative process of the corresponding inverse problem in order to reconstruct very 
large propagation domains. Fortunately, thanks to the decomposition of the matrix 
by separating the time variables from the space ones, we could benefit from the 
block-diagonal structure of the standard DG formulation ending up with an explicit 
scheme, that is more convenient from the numerical point of view. The performed 
numerical tests clearly illustrate the interest of the split version of discrete problem. 
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An hp-Adaptive Цегайуе Linearization A 
Discontinuous-Galerkin FEM ш 
for Quasilinear Elliptic Boundary Value 
Problems 


Paul Houston and Thomas P. Wihler 


1 Introduction 


In this article, we consider the a posteriori error analysis, in a natural mesh- 
dependent energy norm, for a class of interior-penalty hp-version discontinuous 
Galerkin finite element methods (DGFEMs) for the numerical solution of the 
following quasilinear elliptic boundary value problem: 


-V.(u(x,|Vu)Vu)— f in,  u-0 onl. (1) 


Here, 2 C R? is a bounded polygon with a Lipschitz continuous boundary Г, and 
f Е L^(Q), where for an open set D С 0, we signify by L?(D) the space of all 
square integrable functions on D. Additionally, we assume that the nonlinearity и 
satisfies the following assumptions: (А1) и € COCR x [0, оо)); (A2) there exist 
positive constants ти, M,, such that my(t — 5) < u(x, t)t — p(x, 5)5 < My(t — 5), 
t >s > 0, х є 2. We remark that, if и satisfies (A2), there exist constants 
В >a > 0, such that for all vectors v, w € R?, and all x Е 42, 


lie, |v])v — w(x, |jw])w| x |v — ш, 


(2) 
2 " 
ajv — w|" < (u(x, |v)v — u(x, |w|)w) - (v — w); 
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see [14, Lemma 2.1]. For ease of notation, in the sequel, we will simply write (5) 
instead of u(x, s), thereby suppressing the explicit dependence of и on x Е Q2. 
The weak formulation of (1) is to find u € н\(2) such that 


A(u; u, v) = (f, vo ЄН), (3) 


where, given w Е HÌ (2), we define the bilinear form A(w; u,v) = 
So u(|Vw]|)Vu · Vudx, u,v Е Hj(2), as well as the L?(s2)-inner product 
(v, w)R) = dis vw dx, о, ш € L?(Q2). Here, Hl(Q2) is the standard Sobolev 
space of first order, with zero trace along Г, equipped with the norm По (Q) = 


| Vvllyx о У E н\(2). Under the assumptions (A1)-(A2) above, it is elementary 
to show that the form A is strongly monotone and Lipschitz continuous in the sense 
that 


А(и; uu-v)-AQivu—v)melu—vllio, мо ЄН), A 
0 


and 
|A(u; и, v) — A(w; w, v)| < Bllu — wlio Ино) Vu, v, w € Hi(Q2), 


respectively. From these properties, classical monotone operator theory implies 
existence and uniqueness of a solution of (3); see, e.g., [17, Theorem 3.3.23]. 

The exploitation of automatic adaptive hp-refinement algorithms has the poten- 
tial to compute numerical solutions to partial differential equations (PDEs) in 
a highly efficient manner, often leading to exponential rates of convergence as 
the underlying finite element space is enriched; see, e.g., [11, 16]. The key tool 
required to design such strategies is the derivation of a posteriori estimates for 
the Galerkin discretization errors; in recent years such bounds have been extended 
to the context of linearization and/or linear solver errors, cf. [1, 2, 4, 5, 7, 9]. In 
the present article we consider the derivation of an hp-version a posteriori error 
bound for the DGFEM approximation of the second-order quasilinear elliptic PDE 
problem stated in (1). To this end, we employ the interior penalty DGFEM proposed 
in [10], cf. also [12], together with a discrete Kacanov iterative linearization 
scheme, cf. [6]. Based on the analysis undertaken in [12], together with the use 
of a suitable reconstruction operator, cf. [13, 15], we derive a fully computable 
bound for the error, measured in terms of a suitable DGFEM energy norm, which 
separately accounts for the three main sources of error: discretization, linearization, 
and linear solver errors. On the basis of this a posteriori bound, we design and 
implement an hp-adaptive refinement algorithm which automatically controls each 
of these error contributions as the underlying finite element space is enriched. 
Numerical experiments highlighting the practical performance of the proposed 
adaptive strategy are presented. 
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2 Iterative Discontinuous Galerkin Methods 


2.1 Discrete hp-Discontinuous Galerkin Spaces 


Let %, be a partition of 2 into disjoint open and shape-regular elements к such that 
gel кє, К. We assume that each к € Я, is an affine image of a given master 
element €, which is either the open triangle ((x, у): C1 < x < 1, 1 < y < —x]) 
or the open square (— 1, 1)? in R7. By л, we denote the element diameter of « € Jh, 
and n, signifies the unit outward normal vector to к. We allow Jp to be 1-irregular, 
i.e., each edge of any one element к € Jp contains at most one hanging node 
(which, for simplicity, we assume to be the midpoint of the corresponding edge). 
In this context, we suppose that J, is regularly reducible (cf. [18, Section 7.1] 
and [12]), i.e., there exists a shape-regular conforming (regular) mesh Jp (consisting 
of triangles and parallelograms) such that the closure of each element in J, is 
a union of closures of elements of Я, and that there exists a constant C > 0, 
independent of the element sizes, such that for any two elements к є „Я and K є Ih 
with Е C к we have ^«/hz < С. Note that these assumptions imply that Jp is of 
bounded local variation, i.e., there exists a constant о > 1, independent of the 
element sizes, such that p, < hafhy < ру, for any pair of elements kg, Kp € 2, 
which share a common edge e = дк N дк. Moreover, let us consider the set & 
of all one-dimensional open edges of all elements к € Jp. Further, we denote by 
Eg the set of all edges e € & that are contained in the open domain 42 (interior 
edges). Additionally, we introduce & to be the set of boundary edges consisting of 
all e € &that are contained in Г. 

For any integer p € №, we denote by IP, (x) the set of polynomials of total 
degree p on к. Similarly, when к is a quadrilateral, we also consider Q, (x), the 
set of all tensor-product polynomials оп к of degree p in each coordinate direction. 
To each к € Я we assign a polynomial degree p, (local approximation order). 
We collect the local polynomial degrees in a vector p = (py : к € Jp}, and then 
introduce the hp-DGFEM space 


Vos (25, p) = (v € L(2): vlc € Sp, (к) Vk € H}, 


with S being either IP or Q. We shall suppose that the polynomial degree vector 
p. with рк > 1 foreach к € J, has bounded local variation, i.e., there exists a 
constant o? > І, independent of the local element sizes and p, such that, for any 
pair of neighbouring elements ка, кь € Jp, we have o, Кш Ркү/рк, < 02. 

We also define the L?-projection Пар: L?(Q) > Vool Zn, p) by 


T3, pv —U, w)R) =0 Wwe Vos (9%, р). 
Evidently, since functions in У, (Я, p) do not need to be continuous, we have 


that Л, р. = II, ple where, for k € Jh, we let II, p, be the L?-projection 
onto Sp, (к). 
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2.2 Nonlinear hp-DGFEM Formulation 


Let кұ and кь be two adjacent elements of Я, and x an arbitrary point on the 
interior edge e € у given by e = (дк; N дкь)°. Furthermore, let v and q 
be scalar- and vector-valued functions, respectively, that are sufficiently smooth 
inside each element xy, Kp. Then, the averages of v and q at x є e are given by 
(v) = 102001, + vle), (а) = 1/2(4|к, + 4|к„), respectively. Similarly, the jumps of 
v and 4 at x € e are given by [v] = vlk; Dy, + Ul, Nx, [9] = 4l; Пк, T qe, Пк, 
respectively. On a boundary edge e € &g, we set (v) = v, (q)) = q and [v] = vn, 
with п denoting the unit outward normal vector on the boundary Г. 

Furthermore, we introduce the edge functions B, p € L (£), which, for an edge 
e € 6, are given by ble := he and pje = ((р)) le, with he denoting the length of e. In 
addition, we define the discontinuity penalisation function o € L% (£) given by o = 
y 975^! , where y > 1 is a (sufficiently large) constant. Then, we equip the DGFEM 
space Ус (Я, p) with the DGFEM norm 10112, == | V, vlizo + Гоо |01 |? ds, 
v € Voel Tr, p), where V gj, 15 the element-wise gradient operator. 

With this notation, following [10], we introduce the interior penalty DGFEM 
discretization of (3) by: find upg € Vpg(%, p) such that 


Ape (tpe; Иов, 0) = Cf, vL) Vv € У, р), (5) 


where, for given w € Vos (Z5, p), we define the DGFEM bilinear form 
Apg(w; и, v) = |, ЧУ ж wDV ди. V z, v dx 
= | nva mah -Evl ds + 0 J, (ш M EwIDV 2; v) [и] ds 
f B 


+f o [u] - [v] ds, u,v € (2, р), 
EB 


where 0 € [—1, 1] is a method parameter. Referring to [10, Theorem 2.5], provided 
that y > 115 chosen sufficiently large (independent of the local element sizes and 
of the polynomial degree distribution), the existence and uniqueness of the DGFEM 
solution ups € Vog(%, p) satisfying (5) is guaranteed. 


Assumption 1 In the sequel, we suppose that there exists a computable a posteriori 
error estimate of the form ||u — upsllpgag < Noa, f), where и € Hj(2) is the 


solution of (1), and Ugg is its hp-DGFEM approximation defined in (5). 
Remark 1 In the article [12, Theorem 3.2] it has been proved that such a bound 
does indeed exist. More precisely, we have that 

1/2 


lu — иь < C | У n + Of ив) | =: пиво, f), (6) 
КЕ 
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where, the local error indicators Ик, к € Jh, are defined by 


пе = һер MI, p-iCf + У + QU Vioc) Vos) г) 


+ hepr Пер 00У ио) ио) Ilo aer + Y ^ P Широ ак 
(7) 


and (ў, uoc) :— Уке ZO ) + Lee, © 60) is а data oscillation term. For к є Я 
and e € £ g, we have OW) :— h? pz? || (I пя, p-1)le (f У-ШУ иа) Vuo) > 
and eo = hep! I(l — Пет-1)1е (Lu V ag, uol) V 7; oc ]D I. e where | denotes 
a generic identity operator. Неге, we write p — 1 := {px — 1).e5;. Additionally, 
we denote by Пеур_||г the L?-projector onto Р pe-1(€), where we let p, = 
max{ Pk, рк}, With ка, Ky € Jh, e = дк: Пдкь. Moreover, C > 0 in (6) is a constant 
that is independent of the local element sizes, the polynomial degree vector p, and 
the parameters y and 0. 


2.3 Iterative DGFEM 


In order to provide a practical solution scheme for the nonlinear hp-DGFEM 
system (5) we propose a linearization approach based on a discrete Kacanov 
fixed poni iteration, see, e.g., [6]. To this end, we begin by selecting an initial 
guess un € Vos (Z5, p). Then, for n > 1, given unt € Vos( Z5, p), we solve 
the йеп hp-DGFEM formulation, defined by 


Asinus ; Wes v) = Cf, VLR) Vv € Vos “i, р), (8) 


for и". € Vos (Я, p). We emphasize that, in actual computations, the linear sys- 
tem (8) may be solved by an iterative algorithm, thereby generating an approximate 
numerical solution nn. € Vos (25, p), with us, ^ ип. This means that, in practice, 
instead of computing the sequence (u5,],-9 obtained from the iteration (8), an 
inexact sequence (un) п>0 is generated such that 


Apa (ing |; Йе, v) © (ро) W € Vos, p). (9) 


From a mathematical view point, this (inexact) iterative linearization DGFEM 
approach gives rise to three different sources of error: 


1. Discretization error, which is expressed by the residual 


Poe : zm Аве @бе; Up ов? ` =(/, Этә(о)- (10) 
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2. Linearization error, which is given in terms of the residual wf, € Vos (Z5, p): 


(УЛ, Vizo) = Ave (iha; Инь, 0) — Anelka 1; 07,0) Ми € Voal Ph, p). 


(41) 


We observe that, if (1) is linear, then we immediately obtain wi, = 0. 
3. Linear solver error, which is described by a residual Ал. € Уве (Я, p): 


TUM 


(Ass Up) = Aun t Ире» v) = (f, Up) We Vos CZ, D). 
(12) 
Note that, if (8) is solved exactly, then we have 01! = ип! and i, = и" 


5o, and 
it follows that Ал = 0. 


Remark 2 Since Ус (Я, p) may not need to be continuous along element inter- 
faces, the linearization and linear solver residuals үл and Алы, respectively, can be 
computed elementwise, i.e., in parallel, and, hence, at a low computational cost. 


The aim of the analysis in the following section is to investigate the above 
residuals, and then to provide a computable a posteriori error estimate for the 
error ||u — и" [ое between the solution и of (1) and Wn. € Vos (Z5, p). 


2.4 А Posteriori Error Estimation 


In order to bound the residual ри in (10), we apply an elliptic reconstruction 
technique along the lines of the works [13, 15], see also [7]. Specifically, we define 
an auxiliary function 7" € Hi (£2) to be the unique solution of the weak formulation 


AG"; T, v) = (f + Voa + Aoa: pug) УУ € НО), 


where и and Л. are the linearization and linear solver residuals from (11) 
and (12), respectively. Upon adding (11) and (12), we notice that 


Ape (bs; mm v) = (f F Woe + Alas Up) Vv € Vos Ti, p). 
In particular, we observe that 75, is the DGFEM approximation of и” based on 
employing the (nonlinear) DGFEM scheme defined in (5). In particular, we may 
exploit the a posteriori error estimate in Assumption 1 to infer the computable 
bound 


а" — Belloc = n (bs, f + Vos + Abe) (13) 
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We now turn to bounding the elliptic reconstruction error и — 7" € Н!(0); to 
this end, we first observe that ||u — и" ||. = ||u и” lico): Then, employing the 
strong monotonicity property (4), and recalling the weak formulation (3), we obtain 


allu — ÑZ A(QwauWw)-AQ" ии) 
= — (Vio U — Pro — (Ара u — Uo). 
Employing the Cauchy-Schwarz inequality, together with the Poincaré-Friedrichs 


inequality, |112) < CrrllVull_2(q) for all v € Hi(Q), where Cpp > Oisa 
constant, we deduce that 


lu — V^ | < Woa + Ar (14) 


= ров? 
where the linearization and linear solver residuals are given, respectively, by 


1/2 1/2 


2 2 
Чы = Сеја | У? [Чы] › Ade = Сеа | э, [Алей 
кє. кє. 


Summarizing the above analysis leads to the following result. 


Theorem 1 Suppose that Assumption 1 is satisfied. Then, given a sequence of 
(possibly inexact) DGFEM approximations (47.40 C (2, p), cf. (9), forn > 
1, the following a posteriori error bound holds: 


lu — [в nos. F + Was + Хы) + Woa Аш. 


Here, и is the analytical solution of (1), у", and X^, are the residuals defined in (11) 
and (12), respectively, and a > 0 is the constant occurring in (2) and (A). 


Proof The result follows immediately upon application of the triangle inequality, 
ie. |u — oloa < lu И" ве + |" — 0" lloa, and inserting the bounds (13) 
and (14). 


Remark 3 We note that the above analysis naturally applies to other finite element 
schemes, provided that Assumption 1 is satisfied. 


2.5 Adaptive Iterative hp-DGFEM Procedure 


In this section we introduce an automatic hp-refinement algorithm which ensures 
that each of the three components of the error, namely discretization, linearization, 
and linear solver, are controlled in a suitable fashion. To this end, we propose the 
following strategy, cf. [9]. 


414 P. Houston and T. P. Wihler 


Algorithm 1 Given a (coarse) starting mesh %, with an associated (low- 
order) polynomial degree distribution p, and an initial guess Td € Volh, p). 
Setn < 1. 


1: Compute the DGFEM solution W}, from (9) based on employing an itera- 
tive linear solver. Furthermore, evaluate the corresponding error indicators 
Npa f + Woa + Ana) Yoe and Apa 


2: if 
p t Ав T npa f + Woe TAL (15) 


holds, for some given parameter Y > 0, then hp-adaptively refine the space 
Voa(%, p); go back to step (1:) with the new mesh 9, (and based on the 
previously computed solution ит interpolated on the refined mesh). 

3: else, i.e., if (15) is not fulfilled, then set n < n + 1, and perform another 
linearization step by going back to (1:). 

4: end if 


In Step 2 of Algorithm 1, if (15) is fulfilled then the space Vpg(%, p) is 
adaptively hp-refined based on first marking elements for refinement according 
to the size of the local element indicators 7, cf. (7). To this end, we exploit the 
maximal strategy whereby elements are marked for refinement which satisfy the 
condition nk > 1/3 max,e Z, Пк. Secondly, once an element k € Я, has been marked 
for refinement, we undertake either local mesh subdivision or local polynomial 
enrichment based on employing the hp-refinement criterion developed within the 
article [8]. Finally, when (15) is not fulfilled, rather than determining which source 
of error, i.e., the (computable) quantities №. or Л", from (11) and (12), respectively, 
is dominant, we choose to always undertake a further linearization step, and hence 
a further linear solver step is also computed, since this ensures that the most up to 
date approximation 1". is employed at all times. 


3 Application to Quasilinear Elliptic PDEs 


In this section we present numerical experiments to highlight the performance of the 
proposed iterative A p-refinement procedure outlined in Algorithm 1. To this end, we 
set the interior penalty parameter constant y to 10 and the steering parameter Y to 
1/4. 'The solution of the resulting set of linear equations is computed using an ILU(0) 
preconditioned GMRES algorithm. 

For the first numerical experiment, we let 42 = (0, 1)? and define the nonlinear 
coefficient as u(|Vu|) = 2+ (1 + |Уи|)- '. The right-hand forcing function f 
is selected so that the analytical solution to (1) is given by u(x, y) = x(1 — 
x)y(1 — у)(1— 2y)e-20Qx-D*. In Fig. 1 we present a comparison of the actual 
error measured in terms of the energy norm versus the square root of the number of 
degrees of freedom in Vos (.75, p). From Fig. Іа we clearly observe exponential 
convergence of the proposed Ap-refinement strategy as У,.(.2, p) in enriched. 
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Furthermore, in Fig. 1b we plot the individual residual error indicators; for this 
smooth problem, we notice that the discretization indicator (denoted as 7” in 
the figure) is always dominant, while the linearization and linear solver residuals 
(denoted as V" and A", respectively) are roughly of the same magnitude. 

Secondly, we let 2 denote the L-shaped domain (—1, 1)2\[0, 1) x (C1, 0] c R? 
and select и([Уи|) = 1 + exp(—|Vu|*). By writing (r, 9) to denote the system 
of polar coordinates, we choose the forcing function f and an inhomogeneous 
boundary condition such that the analytical solution to (1) is и = r?/? sin (2/39), 
cf. [3]. In Fig. 2 we now present a comparison of the actual error measured in terms 
of the energy norm versus the third root of the number of degrees of freedom in 
Уье(.Я%, p); as before we again attain exponential convergence of the proposed 
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Fig. 1 Example 1. (a) Comparison of the DGFEM norm of the error and the a posteriori bound, 
with respect to the square root of the number of degrees of freedom; (b) individual error estimators 
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Fig. 2 Example 2. (a) Comparison of the DGFEM norm of the error and the a posteriori bound, 
with respect to the third root of the number of degrees of freedom; (b) individual error estimators 
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hp-refinement strategy as V5s(.7,, p) is adaptively refined, though convergence 
of the a posteriori error estimator is no longer monotonic. Indeed, from Fig. 2b, 
we observe that once an /p-mesh refinement has been undertaken, then several 
linearization/solver steps may be required to ensure that the numerical solution 
has been computed to a sufficient accuracy before future refinements may be 
undertaken. 


4 Conclusions 


In this article we have derived a computable hp-version a posteriori error bound 
for the DGFEM approximation of a second-order quasilinear elliptic PDE problem, 
whereby a discrete Kačanov iterative linearization scheme is employed. The 
resulting computable upper bound directly takes into account discretization error, 
as well as the errors stemming from linearization and the underlying linear solver. 
Numerical experiments highlighting the performance of this bound within an 
automatic A p-refinement algorithm are presented. 


Acknowledgements TW acknowledges the support of the Swiss National Science Foundation 
(SNF), Grant No. 200021. 162990. 
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Erosion Wear Evaluation Using (f 
Nektar++ es 


Manuel F. Mejía, Douglas Serson, Rodrigo C. Moura, Bruno S. Carmo, 
Jorge Escobar-Vargas, and Andrés González-Mancera 


1 Introduction 


Wear is a common phenomenon on many machines and devices, it is characterised 
by the removal or loss of material. Erosion wear is a particular wear process which 
occurs when solid particles or droplets, carried by a fluid (liquid or gas), impact on 
a solid surface [1]. Turbomachinery such as pumps, turbines and pipe accessories 
(i.e. tees, elbows, nozzles, valves), are examples of elements affected by the erosion 
wear, decreasing the performance and the lifetime. In many industrial sectors e.g. 
energy and mining, and oil & gas; massive amounts of resources are used for 
maintenance and replacement of affected parts [2—4]. Despite this phenomenon have 
been broadly investigated [5—14] there are still unsolved challenges in establishing 
the influence of small eddies during the erosion process leading to modest accuracy 
levels in the simulation results. 
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Due to the microscopic nature of erosion, the smallest scales in the flow play a 
fundamental role in the complete process. One of the aspects which has not been 
carefully studied in erosion wear modelling is the effect that the smaller eddies 
and secondary flows have on the particles interactions with the surface. In general, 
these secondary flows could not be represented using linear Reynolds Average 
Navier Stokes (RANS) simulations, this is mainly because the Reynolds stress 
imbalance is neglected and the secondary flow does not develop. As was mention 
by Gross and Fasel [15], predictions of the secondary flow require non-linear 
Reynolds stress, full Reynolds-stress models, Large Eddy Simulations (LES) or 
Direct Numerical Simulation (DNS). Due to their relatively low computational cost, 
RANS models often used to predict on erosion using CFD in industrial simulations. 
The inclusion of smaller eddies and secondary flows in the simulation could be 
a major breakthrough in the modelling of erosion process. In order to capture in 
an accurate way the physics related with the small eddies and secondary flows, a 
numerical technique capable to represent those processes, is needed. As emphasised 
by Jacobs [16], the use of spectral methods could allow increased accuracy in the 
simulation due to the potential to simulate a wider range of scales. With this in mind, 
the purpose of this work is to assess the impact of higher resolution methods on the 
prediction of erosion wear rate and distribution. 


1.1 Spectral Methods 


Several numerical techniques are used to solve Navier Stokes (NS) equations. Some 
of them are finite differences, finite volumes and finite elements. Nevertheless, when 
high accuracy is required the use of a lot of elements is needed in the modelling, 
which significantly increase the computational cost [17, 18]. Hence novel methods 
are subject of research to offer a better rate accuracy and computational cost. 

Among novel numerical methods considered nowadays are spectral methods, 
which have shown to be a powerful tool with high level of accuracy for solving 
large problems in computational fluid dynamics (CFD), according to the available 
literature, especially in the studies developed by Boyd [19], Canuto et al. [20-22], 
Trefethen [23, 24] and Sherwin [25, 26]. Nektar++ is an open-source software 
framework designed to support the development of high-performance scalable 
solvers for partial differential equations using the spectral/hp element method[27]. 
High Order CFD methods have been receiving considerable attention in the past two 
decades. Traditional CFD software could be replaced by high order code in many 
applications in few years [28]. 


1.2 Particles Tracking 


To the best of the authors' knowledge, there is no work that uses high order 
methods to evaluate erosion wear rate. This research aims to assess the impact of 
higher resolution methods on the prediction of erosion wear rate and distribution. It 
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comprises the solution of fluid flow using incompressible NS solver with implicit 
LES modelling, the implementation of a Lagrangian particle tracking model and the 
later data processing through traditional erosion rate models but using the available 
high order information. This could allow the evaluation of traditional rate models 
with more spatial resolution and accuracy. 

The Lagrangian particle tracking model is based on one-way coupling approach, 
that is the most simple case when just the iteration between the fluid and each 
particle is taking into account in just one way. That means that the particles are 
moved by the fluid but the fluid flow is not perturbed by the particles. Moreover, the 
effects of the collision between particles are also neglected. The one-way coupling 
model is valid for volume concentrations of particles lower than 10-6 [29, 30]. 

The problem of predicting particle motion in a fluid flow can be predicted by 
solving an evolution equation in time: 


d Vp 
dt 


d Xp 
dt 


= F(u, p, рр, Са,...) ; =Ур (1) 
where уь and хр are the particle velocity and position and F is a function of the 
velocity of the fluid u, the density of the fluid р and particle density pp, among 
others. 

To start, it is necessary to obtain the velocity on a certain point from the eulerian 
velocity field. This process consists of finding the element containing the particle 
and interpolating the velocity with the element information. In a higher-order 
velocity element field, the use of linear interpolation is inaccurate and could vanish 
the advantage won with the use of high order methods. On the other hand, using high 
order interpolation could be computationally expensive. Therefore, special attention 
to this procedure is required [16, 31, 32]. 


1.3 Erosion Wear Evaluation 


Once the information about the collisions is complete, the erosion wear model is 
used to predict the pattern of material removed. The general erosion equation, based 
on the work of Finnie [33—38] can be presented as 


W = kF,Vh f (9) (2) 


W is the erosion rate or material removed by collision, k is a wall material 
dependent constant, F, is the particle geometric factor, Ур and n are the collision 
velocity and the velocity exponent, and f (0) is a function of collision angle. Several 
authors define these values for different materials configurations and test cases. 
Three of the most used models, which include experimental results are the jet 
impingement test [39—45], elbow erosion [46—49], and the works of the Wong et al. 
[4, 46, 50, 51]. 
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2 Implementation 


This section describes the implementation of erosion wear in Nektar--4-. To achieve 
this objective is important to have in mind the partition of the problems into two 
parts. The first one is the particle tracking as a filter within the Nektar++ incom- 
pressible Navier Stokes solver. A filter in Nektar++ is a module for calculating a 
variety of useful quantities from the field variables as the solution evolves in time 
[27]. 

The second one is implemented as a FieldConvert module to evaluate the erosion 
of each collision and generate the fields on the boundaries walls. FieldConvert is a 
utility embedded in Nektar++ with the primary aim of allowing the user to work 
with the Nektar++ output files, some of the modules within FieldConvert allow the 
user to postprocess the output data [27]. 


2.1 Particles Tracking 


The first step was the implementation of a ODE time solver. Several options are 
available, but having into account the discrete time flow fields calculated with the 
Navier Stokes incompressible solver, and to avoid the use of temporal interpolation, 
the selected option was the Adams-Bashforth (AB) and Adams-Moulton (AM) 
schemes. 

The implementation was tested with a benchmark case presented in [31]. In this 
model, the particle velocity is the fluid velocity at certain point and the evolution 
equation is reduced to one equation; Eq. 1 is reduced to: 


d Xp 
dt 


=u (3) 


To solve this system a Time-Marching Method was implemented, meaning that 
the future values are evaluated using the present and past values of the variables. 
Explicit AB and Implicit AM methods were implemented using first to fourth 
integration order. The error values obtaining using AB and AM with different order 
presents features from this kind of methods. 

The next step was the implementation of the solid particles. In this case, the 
momentum equation is evaluated on each particle, resulting: 


(и Ур) РР; Pay, (4) 
Dp 


ы 


СаКеррр 
бар» (5) 
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Reg = ПЕТЬ (б) 
24/ Rep, Rep < 0.5 
Са = }24/ Rey (1+0.158е%687), 0.5 < Rep < 1000 (7) 
0.44, Rep > 1000 


where Ри is the drag force, Rey is the Reynolds Number based on the diameter 
of the particle, g is the gravity acceleration, and Са is the drag force evaluated on 
each particle. 

Figure 1 shows a diagram of the evolution equation. Current position, velocities 
and forces are evaluated to get the future positions (BP, OP) until the next position 
is located outside of the domain (NP). When this happens, the evolution algorithm 
stop and a function is used to evaluate the collision point (CP) and the position after 
of collision (NP’) using the high order information about the walls. 


2.2 Erosion Wear Evaluation 


Erosion rate per collision (Eq. 2) has to be integrated over each element of the eroded 
surface. For each particle collision, more material is removed from the surface, 
the elemental erosion rate has to take into account this cumulative effect over the 
surface. 

As mentioned before, the set of parameters used in this work, has been based 
on experimental data. One of the most used parameter set is the one proposed by 
Erosion group of the University of Tulsa [38, 48, 52]. The erosion rate takes the 
form of Eq.2, F; — 1 for sharp (angular), 0.53 for semi-rounded, or 0.2 for fully 
rounded sand particles. V, is the impact velocity and п = 1.73. The angle function 
has the form: 


a0? +0, ford < ф 
x cos?(0) sin(w0) + y ѕіп2 (0) +z ford>¢ 


ХӨ) = (8) 


Fig. 1 Evolution of particle tracking 
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АП the parameters and empirical constants depend on the material being eroded. 
For velocity in ft/s, the steel-sand parameters аге: а = 38.4, b = 22.7 ф = 1, 
x = 0.3147, у = 0.03609, w = 0.2532 and z = 0 [53]. 


3 Test Case 


To test the new feature in Nektar++, a Backward Facing Step (BFS) model was 
developed based on the experimental setup of [30, 32, 54] showed in Fig.2. In 
the model developed in this work, the simulations were done with the addition of 
gravitational effects on the — y direction. In original experiments the air at the inlet 
is a well development turbulent flow (u — 10.5 m/s), this is used a inlet condition 
and, to complete the model, a zero pressure condition at the output. The additional 
boundaries were set as walls. A zero velocity field was set as initial condition. The 
particles used have a 70 um diameter and 8808 kg/m? density. 

Figure 3 (top) shows a snapshot of the velocity field when the statistically 
stationary regime is reached (1 = 85), next the particles are released and were 
convected by the flow. Particular trajectories are shown in grey lines in Fig.3 
(bottom). In the same figure, results of the particle collision with the walls, 
computed with Eq. 2, are also shown. 

From the results presented Fig.3, the typical BFS velocity profiles can be 
recognised. It is important to note the details behind the step, the main flow 
originates the secondary eddies and defines the limit of the recirculation zone 
(x/H=7 from the step) where backflow occurs. Additionally, interesting details 
appear in between each main velocity flow ripple and the walls along the x-direction. 

It is noteworthy that particle tracking is evaluated using a steady velocity field, 
therefore the existence of several irregularities is expected, for instance, particle tra- 
jectories inside recirculation zone. Erosion rate depends on the number of collisions 
at specific points. It is a localised phenomenon that does not occur continuously in 
the domain. Its distribution shows a strong dependence on the flow dynamics. 


<—— ху = 2.5 ћ 
——— Хо = 35 h ————— 


Fig. 2 Geometry of the Backward Facing Step setup. The initial velocity was set to get a Re = 
18,600 
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Fig. 3 BFS case results. Top: Velocity field at a statistically stationary condition. Bottom: 
Distribution of the particles inside the flow (gray lines). The colours in the walls indicate the 
location of the normalised erosion rate 


4 Conclusion and Future Work 


This work presented a method developed to asses the erosion wear rate using a 
high-order (spectral) element based technique on a modified test case implemented 
in Nektar+-+. The methodology proposed in this study have a potential to increase 
the accuracy when solving this kind of problems. Future research activities are going 
to be focused on the determination of accuracy improvements and optimisation of 
the proposed methodology. Several more cases have to be tested to produce solid 
conclusions about the implemented methodology, as well as a detailed comparison 
with experimental test cases. 

Despite the methodology implemented had several important simplifications, 
as the use of one-way coupling and the few forces taken into account, allowed 
quicker implementations and results. This work would be an interesting starting to 
implement this kind of simulations using Nektar+-+. However, to run more realistic 
cases, additional research efforts are required for the implementation of two-way 
and four-way coupling and the effects of other forces over each particle. 
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in Pipeline Networks 


Herbert Egger, Thomas Kugler, and Vsevolod Shashkov 


1 Introduction 


The flow of gas in a horizontal pipeline of constant cross section is described by [2] 


Ад,р + д.т = 0 (1) 
дит + д 2А + Ар | = А (2) 
тте ЕРО = Су Ag 


Here А and D are the cross section and diameter of the pipe, and A is a dimensionless 
friction parameter. The functions р, p, and m describe the density, pressure, and 
mass flow rate of the gas. Under isothermal flow conditions, one has 


pec (3) 


with constant c denoting the speed of sound. In practically relevant scaling regimes, 
the nonlinear term on the left hand side of (2) is usually neglected, which can be 
justified by an asymptotic analysis [2, 7]. Using this simplification and Eq. (3) to 
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eliminate the density, one arrives at evolution problems of the general form 


40; p + дхт = 0 (4) 
дут + д.р = —dm (5) 


where a and b are positive constants and d = d(p, m) denotes a state dependent 
friction coefficient. For our analysis, we will consider d — d(x) as a function 
depending only on space which can be justified, e.g., by linearization around a 
steady state. Corresponding models for the gas flow on pipe networks are obtained 
by coupling the flow equations for single pipes via algebraic conditions [9, 10]; see 
below. 

The discretization of (4)-(5) and its extension to pipeline networks has been 
discussed intensively in the literature. In [9], a Galerkin approximation for (1)-(2) 
with cubic Hermite polynomials is investigated numerically. The discretization of 
transient gas flow models is also studied [2, 5, 8]. An entropy stable finite volume 
method is proposed in [10], and an energy stable mixed finite element approximation 
is investigated in [3]. Apart from [9], all methods discussed above are of lowest order 
and no rigorous convergence analysis is given. 

In this paper, we study the discretization of (4)-(5) by a Petrov-Galerkin 
approach of potentially high order. The resulting scheme is shown to be stable 
which allows us to prove order optimal convergence rates. By using an appropriate 
functional analytic setting, the convergence results can be generalized almost 
verbatim to pipeline networks. A hybridization strategy will be discussed that 
facilitates the implementation and that allows to incorporate non-standard coupling 
conditions. The proposed method formally also allow to treat nonlinear models of 
gas transport and, in principle, high order convergence can be obtained in practically 
relevant regimes. 


2 Notation and Preliminaries 


Let ху < xg and denote by L? (xL, xg) and WEP (xr, xg), К > O the standard 
Lebesgue and Sobolev spaces. The scalar product and norm of [.?(хр, хк) are 
written as (v, w) and ||v|| = ||v|| 2. Other norms will be designated by subscripts. 
We write H* (xL, хк) = W*2(x,, хв) for the Hilbert spaces and define 


НІ = {v € H! (x1, xr) : у(х) = vier) =0} and H(div) = H! (xz, xg) 


for convenience. The reason for introducing the space H (div) will become clear 
when considering networks, where the spaces H! and H(div) have different 
continuity properties across junctions. By L"(0, T; X) and W^'(0, T; X) we 
denote the Bochner spaces of functions f : [0, 7] — X with values in X. The 
value of f (t) may then itself be a function. In the following, we consider the linear 
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system 
ao; p(x, t) + àym(x, t) = f(x,t), (6) 
bàm(x, t) + 0x p(x, t) + d(x)m(x,t) = g(x, t), (7) 


for xr, « x < xg and t > 0 with homogeneous boundary conditions 


р(х1, t) = р(хк, t) = 0. (8) 
Inhomogeneous and more general boundary conditions can be considered as well 
and our analysis applies with minor modifications. We will assume that 


(Al) a,b are positive constants, and 
(A2) de L'? (xz, хк) with0 < d < d(x) < d and constants d, d. 


For given f, g € L?(0, T; L?(xz, xg)) and initial values р(0) € НИ, т(0) є 
H (div), existence of a unique solution follows from semigroup theory. Any smooth 
solution of problem (6)-(8) also satisfies p(t) € H, 1 m(t) € H (div), and 


(ad p(t), 4) + Omt), 4) = Cf(.q) (9) 
(b8,m (t), V) + (8x p(t), V) + (Чата), V) = (g(t), 0) (10) 


for all 0, q € L?(x L.Xg) and all 0 < г < T. This variational characterization will 
be the starting point for our discretization approach introduced in the next section. 


3 Petrov-Galerkin Approximation 


Let xy, = xo < x1 <... < xy = хк be a partition of the interval [x,, xp] into 
elements 7, = [х„—1, xn]. We call Tj := (T5 : 1 € n < N} the mesh and denote by 
hy = |Xn — Xn—1| and h = max; hy the local and global mesh size, respectively. Let 


Py(Th) := {v € L? (xz, xg) : v|r € Pe(T) VT € Th) (11) 


be the space of piecewise polynomials on the mesh Ти. We fix k > 1 and search for 
approximations for the solutions p(t), m(t) of problem (6)-(8) in the spaces 


On = Pk(Th) Hg and V, = Pk(T;) N H (div) (12) 


of continuous piecewise polynomials with appropriate boundary conditions. Аз 
finite dimensional test spaces for the variational problem (9)-(10), we choose 


On = Р (Th) and Я, = Pia (Th) (13) 
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consisting of discontinuous piecewise polynomials of lower order k — 1. We denote 
by I% : H' (xz, XR) > Pk(Th) П H' (xz, XR) the H!-projection operator, defined 
by 


(Ik v) (xx) = v(xy) foralO<k < №, (14) 

апа (8, If v, Ùn) = (дуу, Up) for all ти € Py (T5), (15) 

and let m! tL? (xp, xg) = Py (Th) be the L?-orthogonal projection, satisfying 
Gr. 1,54) = (v, Th) forall є РКТ». (16) 


Note that both projection operators I% and лк! can be defined locally on every 
element. Moreover, they are mutually related to each other by the commuting 
diagram property 


Ov = m; Qv —foralve H! (xz, xg). (17) 


For the approximation of problem (6)-(8), we then use the following approximation. 


Problem 1 (Inexact Petrov-Galerkin Method) Find functions p; € Hj (0,7; Оһ), 
т € Н!(0, T; Vn) with рһ(0) = Ik p(0) and mj (0) = IKm(0), and such that 


(ад, pn (t), dn) + Osma (t), qn) = (FO), аһ) (18) 
(Бауть (t), V) + (Ox рһ(@), Vn) + (dot ma (t) Vj) = (800), V) (19) 


for all jj € On = Pi 1(Ть) and Ùn € V, = Py (Th), and for all 0 < t < T. 


The well-posedness of this problem follows from the results of the next section. 


4 Discrete Stability Estimates 


We now derive some discrete stability estimates that yield well-posedness of the 
semidiscrete method and that allow us to establish error estimates of optimal order. 


Lemma 1 Let pn, тһ denote a solution of Problem 1. Then 


ат pal? + bling m ON? 
k—1 2 k—1 2 : 1 k—1 2 1 k-1 2 
< С(Т) [айлу PAON? + Блу \ть(0 +] слу FOP + яв "864 


with constant C(T) < CT and C independent of T and the solution. 
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Proof Let us first note that aklan, лк 14) = (qn, mq) for all а Е 


H (xz, хк). By testing (18)-(19) with gn = ле pa(t) and vn = лу ть (0), 


we then get 


а (а _ Ь ИЕ 
E ! p (OI? + gl mif) 


= (ad; py (t), ET! рһ(0)) + (bamn (t), ле ть) 
= —(demn(t), my | pa(t)) — (д pr (€), ле ть (0) — (дл тһ), akma) 


+ (ak EO, aKT pa) + GE 140), mE a). 


By identity (17), integration-by-parts, and the boundary conditions (8), one can 
verify that (d:mp(t), л | рн (Е) + (Ox pat), zy ть (0) = 0. Via Cauchy- 
Schwarz and Young inequalities, and using positivity of d, we then obtain the 
estimate 


d и b p 

= (5 ! pn OW. + ул} ФР) 

= ал my), ny mat) + (т! FO. aE | PaO) + GE 16 (0), пт (0)) 
Lo _ 11 = 1 = 

< lr Ph ON + ль ть OP) + Ce FOI? + лу O). 


The Gronwall lemma and the choice о = 1/7 finally yields the assertion. o 


Note that the above estimate does not yet give full control over the solution. A 
repeated application, however, allows us to prove the following stability estimate. 


Lemma 2 Let pn, mn denote a solution of Problem 1. Then 
lp GP + im ON? 
< CTK Ph ON? + їл ma ON? + л dipa ON? + hes cma IP 


t 
+ [ lp OM? + л "зая + hr 0e FG + А re (8012) 


for all 0 < t € T with C'(T) = C'T and С’ independent of T and of the solution. 


Proof Asa direct consequence of the Poincaré inequality, one has 


k-1 k-1 
1р < dm, Pall + hllðxpal and Imal] < m, mall + Allðxmall. 
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The first terms in these estimates are already covered by Lemma 1. From the two 
Eqs. (18)-(19) with дһ = дут (t) and Ùn = ду pr (t), we further deduce that 


laem (ON < (л! FON але ap ONIMA and 


là pa CI? < (1251 ON + blik Amn ON + dlak m Oa pr Oll. 


Bounds for |719, pn (в) | and [л 3m; (t) || can be obtained by formally differ- 
entiating (18)—(19) with respect to time and applying Lemma 1 for the resulting 
system. A combination of the above estimates then yields the assertion of the 
lemma. o 


Remark 1 Problem | formally amounts to a finite dimensional system of differential 
algebraic equations. From the stability estimates of Lemma 2 and [6, Theorem 4.12], 
one can deduce that this system is solvable for any choice of admissible initial 
values. The semidiscretization is thus well-defined. Further note that the stability 
constants in Lemma 1 and 2 are independent of the polynomial degree К. 


5 Error Estimates 


As usual, we decompose the error according to || p — pill < lp — If pl +| Ik p— рһ\ 
and |m — mp|| < |m — Itm | + || Itm — ть|| into approximation and discrete 
error components. The first part can be handled by the following estimates [11]. To 
simplify notation, we assume that the mesh is quasi-uniform in the following. 


Lemma 3 Let w € HS! (Th), 0 < s < k. Then 
k h 5-1 
lw- Мы! <С(#) [шьш (20) 
For any w € L? (xL, xg) N H" (Th), 0 < s < k, one has 
5 
ил wl С (1) 1. QU 


Here Н°(Т,) = {ш € L? (x1, хк): w|r € H*(T)} is the space of piecewise smooth 
functions and |w|s;& := (т шит) 7 is the corresponding seminorm. 
Moreover, the constant C in the estimates is independent of h and k. 


Using Eqs. (9)-(10) and (18)-(19) characterizing the continuous and the discrete 
solutions, one can see that the discrete error components p; (t) :— I% p(t) — pr(t) 


and my (t) :— Ikm) — mp(t) satisfy Eqs. (18)-(19) with initial values p} (0) = 0 
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and 77, (0) = 0, and right hand sides given by 


F(t) = akap) — д.р) and 
F) :— b(Ifóym(r) — dm) + ат Ет) — то). 


By the a-priori estimates of Lemma 2, one then obtains the following result. 


Lemma 4 Let d € Po(T;) be piecewise constant. Then for all 0 < t < T one has 


Wk pW) — ры? + 11те) — mal? 


< ССТ) (Аа, 00) — d POI? + А18 m Q0) — amO 
t 
+f т) = mO? + 19i pG) — а, p(s)? + 10 Әт) — Әта)? 
+ дир) = 9 PON? + ВТ дит() — dum) Pads), 


with a constant C" (T) = C"T and C" independent of h, К, T, and of the solution. 


Proof We apply Lemma 2 for p, (t) = Ik p(t) — pn (t) and ть (t) = Itm (t) тһ (£t) 
and then estimate the terms on the right hand side of the result step by step. By 
definition of the initial values, we have р), (0) = ти (0) = 0. Moreover, 


ль d рл) = лу f (0) — д,ть (0) = ль f (0) — dx Tym O) 
=i! f (0) — ле !дут(0) = тү 8 pO), 

where we used the definition of the initial value m,(0) in the second and (17) 
in the third step. Thus lm 8r Ph (0)|| = || Fa; p(0) — 0 p(0)||, and in a similar 
manner, one can show lx E ofr (011 = 1259, m (0) — дт(О)||. This explains 
the first two terms in the estimate in the lemma. The terms under the integral are 
derived by estimating (т! TOR үт e(l and the derivatives 12—19, Ра) ||, 
ll. =i 8,g(t)|| via the triangle inequality, and noting that 


my! (dak EmA) — dm) = dug km) — т), 


where we used that d is piecewise constant. О 


Remark 2 А similar result can be proven for piecewise smooth d € W! (Tp) and 
additional terms of the form 19 — 04| үл! p(t) — p(t)|| arise. Ford є W'^?*(T;), 
the product of the two terms again has optimal approximation order. 


By combination of the above estimates, we finally obtain the following result. 
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Theorem 1 Let (A1)-(A2) hold and d € Wo (Tp). Furthermore, let (p, m) bea 
sufficiently smooth solution of (6)-(8). Then for all 0 € t < T, one has 


hk+! 
IPE — PrO + mE) — m Oll x Cu, p, T 


For sufficiently smooth solutions, the proposed method thus converges at optimal 
order in л and at almost optimal order in the polynomial degree К. 


6 Extension to Networks 


We now illustrate that our method and the convergence results of the previous 
section can be generalized easily to pipe networks. Let (V, ©) denote a directed 
graph with vertices v Е V and edges e Е ©; see Fig. | for illustration. For any edge 
€ = (v1, v2), we define nf (11) = —1 and п) = 1. The matrix N with entries 
Nij = пе; (vj) then is the incidence matrix of the graph. For any vertex v € V, we 
define &(v) = (e: e = (v, ) ore = (:, v)}, and we set Vp = (v € У: |E(v)| > 1} 
and Vz = (v € У: |&(v)| = 1} which gives a decomposition “У = Vo U V3 into 
interior and boundary vertices. 

To every edge e, we associate a positive length £^, and we identify e with [0, £^] 
in the sequel. This allows us to define spaces LP (€) = (v : vle є L?(e)} and 
H 106) = (v e [7(5): vle € H 1(е)} of, respectively, integrable and piecewise 
smooth functions on the graph. The flow of gas in a pipe network is then described 
as follows: On every edge e representing a pipe, we require that 


a0; p^ + дт = f° (22) 
Ь°д,т*° + д.р“ + аёт = 8°, (23) 


where f° = f|, denotes the restriction of a function f є LP(&) to опе edge. The 
equations for the individual pipes are coupled by algebraic conditions 


У mOn (о) = 0 v € Vo (24) 
еЕ& (у) 

р) = p*() — VE Vo, e, e € &(v) (25) 
at the pipe junctions, and at the boundary vertices, we assume that 


p*(v) 20 v € Və. (26) 
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Fig. 1 Directed graph (V, &) 
modeling the pipe network 
topology used for numerical 
tests 


Inhomogeneous and other types of boundary conditions can again be incorporated 
with minor modifications. For the analysis of the problem, we now utilize the spaces 


Hd := {p€ H'(8) : (25) and (26) are valid} (27) 
H (div) := (m € H'(8) : Q4) is valid] (28) 


which are the natural generalization of those used for the analysis on a single pipe. 
Any solution (p, m) of (22)-(26) then again satisfies p(t) € Hj, m(t) € H (div), 
and 


(a3, p(t), 4) + Omt), 4) = Cf. 4) (29) 
(bü,m (t), V) + (Ax p(t), V) + (dm(t), D) = (g(t), 4) (30) 


for all є L?(£), v € 12(8), and all 0 < t < Т. Here (v, ш) = У), (v^, ше), with 
(v*, ме) = /, v* w*dx denotes the scalar product on L?(&). 


Remark 3 Let us note that (29)-(30) has exactly the same form as the variational 
problem (18)-(19) on a single pipe. The inexact Petrov-Galerkin method and all 
results derived in the previous sections therefore translate almost verbatim to the 
network setting; let us refer to [4] for details and similar results for a different 
method, and to Sect. 9 for numerical illustration. 


7 Remarks on the Efficient Implementation 


In the discretization of (29)-(30), also compare with (18)-(19), the continuity and 
boundary conditions (24)-(26) are directly incorporated in the definition of the 
spaces Ор C Ho and V, C H(div). For the implementation, it may be more 
convenient to use larger spaces Qj, Vn C H 1(£), and to enforce some of the 
boundary and coupling conditions (24)-(26) explicitly by additional equations. 
Using the wording of [1], this approach of relaxing continuity conditions might 
be called hybridization. Since the resulting method is algebraically equivalent to the 
original scheme based on function spaces with incorporated coupling and boundary 
conditions, all results of the previous sections apply verbatim also to the method 
obtained after hybridization. 
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8 Nonlinear Problems 


The formal extension of the Petrov-Galerkin method to nonlinear problems is 
straight-forward. The discrete variational formulation for (1)-(2), for instance, reads 


(Ад; pn (t), qn) + (xma (t), qn) = 0 


my (t)? 
Apn(t) 


А Ima g " 
LL ni Im, (t), UR). 
2D Apn(t) 


(дить (t), Vn) + (Ox | + amo) , Un) = — 


Numerical quadrature can be used in practice to facilitate the handling of the 
nonlinear terms. We do not give a complete convergence analysis here, but instead, 
we will demonstrate by numerical tests that for smooth solutions, the convergence 
results of Theorem 1 remain valid, at least in the practically relevant case of 
nonlinear friction. 


9 Numerical Results 


We now illustrate the theoretical results of Sect.5 by numerical tests. For our 
computations, we consider the pipe network depicted in Fig. 1. As a first test case, 
we consider the linear problem (22)-(25) with inhomogeneous boundary conditions 


Plu) = рь@) v € 'V5 (31) 


and we set py, (f) = 1 and py,(t) = 1+ i sin(zt) in the following. All pipes are 
chosen of unit length £ = 1 and the model parameters are set to a = b = d = 1. 
The simulation is started from a stationary state for the boundary values at initial 
time. The results of the computations are summarized in the left column of Table 1. 
As predicted by our theoretical results, we observe second order convergence. 

We now repeat our numerical tests for the same network but with a semilinear 


gas flow model resulting from (1)-(3) by dropping the nonlinear term ay (25) in 


Table 1 Errors e; = (all pa (T) — Pr (I? +bl|ima (T) = mr p(T at time T = 10 obtained 
with the Petrov-Galerkin approximation for the network problem with different gas flow models: 
linear model (left), semilinear model (middle), and quasilinear model (right) 


h Linear eoc Semilinear eoc Quasilinear eoc 
0.10000 0.01936 E 0.02359 = 0.02534 = 
0.05000 0.00482 2.00 0.00660 1.83 0.00693 1.87 
0.02500 0.00120 2.00 0.00168 1.97 0.00200 1.79 
0.01250 0.00030 2.00 0.00042 1.99 0.00076 140 


0.00625 0.00008 2.00 0.00011 2.00 0.00036 1.09 
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Fig. 2 Flow rates at boundary vertices v, and vg for linear, semilinear, and quasilinear flow models 


Eq. (2). The model parameters are chosen аз A = 1, c = 1, and A/(2D) = 7/2; the 
latter was selected such that average of the resulting mass flow was similar to that 
of the linear model considered above. The computational results are depicted in the 
middle column of Table 1. Also for this nonlinear friction model, we observe second 
order convergence. These results can be explained theoretically in a similar way as 
those for the linear case by using a perturbation argument. In the right column of 
Table 1, we display the corresponding results for the quasilinear flow model (1)- 
(3) with the same parameters as used in the semilinear case. Note that a decrease 
in the convergence rates to first order is observed here. This is no surprise, since 
our analysis heavily relied on the anti-symmetry of the spatial derivative terms in 
(18)-(19), which is no longer valid for the quasilinear model (1)-(2). 

In Fig. 2, we display the flow rates т |, at the boundary vertices v; and vg for the 
three different gas flow models discussed above as function of time. The results are 
in reasonable agreement. In summary, the semilinear model seems to yield the best 
compromise between modelling errors and convergence order. 
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New Preconditioners for Semi-linear (8) 
PDE-Constrained Optimal Control ш 
in Annular Geometries 


Lasse Hjuler Christiansen and John Bagterp Jorgensen 


1 Introduction 


Large-scale optimization problems that are constrained by partial differential 
equations (PDEs) play a key role in various fields of science and engineering [2, 10]. 
As a challenge, the size and complexity of the PDE-constraints presents severe 
computational difficulties that often prevent the use of general-purpose black-box 
optimizers. As a consequence, cost efficient, specialized solvers become essential 
[1,3, 7, 8]. As a contribution in this direction, this paper demonstrates how to extend 
seminal ideas of Shen [15-17] to construct fast and memory-efficient optimizers 
for the class of semi-linear PDE-constrained optimization problems with non-linear 
reaction kinetics 


1 
pu - f (убх) — ya) + 2 / u(xy!dx, (1a) 
у, ucU ad 2 Q 2 о 
st  —Ay+GO)=u in 2. (1b) 


The paper focuses on the specific cases of either homogeneous (1) Dirichlet or (2) 
Neumann boundary conditions, where 2 C R? is an annular domain of the type 


R :={(x, у) є R? Ja < x? +y? <b}, О<а<Ь. (2) 


For a given non-linear reaction term, G(-), and Tikhonov regularization parameter, 
р > 0, the control problem (1) aims to determine the optimal state and control 
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variables, (y*, и*), that minimize the objective (1a). Here the optimal solution must 
belong to the set of feasible pairs, (y, u), that satisfy the PDE-constraints (1b) and 
the additional admissibility condition, и є Uaa. To be concrete, this paper focuses 
on the case of bi-lateral point-wise control constraints 


Ола = {и € L?(Q) : ua € u(x) < up ae. in £24). (3) 


Point-wise bounds of the type (3) appear in a number of practical applications, 
where the control must satisfy, e.g., operational limitations that are not naturally 
captured by the underlying PDE (1b). In the limiting case, where ug :— —oo and 
Up :— оо, the admissible set becomes Ugg = 12(04). This corresponds to the case 
where the PDE (1b) constitutes the only constraint. 


1.1 Main Contributions and Outline 


This paper contributes to a recent series of efforts by the authors that seek 
to construct fast, iterative solvers for a range of PDE-constrained optimization 
problems by exploiting the properties of customized spectral bases [4—6]. This series 
of work aims to introduce a high-order alternative to the widely-used constellation 
of low-order finite-element methods and Schur-complement preconditioners that 
currently predominates the literature on PDE control [12—14]. Previous efforts have 
mainly considered distributed control of elliptic and parabolic non-linear diffusion- 
reaction systems. The main focus has been on problems in rectangular domains, 
where PDEs constitute the only constraints. As a natural extension, this paper 
investigates how to modify the existing methods to account for (1) bound constraints 
of the type (3) and (2) different geometries. For the sake of brevity, the paper restricts 
attention to annular domains (2). However, with slight modifications, the approach 
generalizes to cylindrical geometries of the type 


Qc := (0х, у, Е ВЗ| а < x? + у” «b, сє (0, №), 0«a «b. (4) 


As the main contribution, this work proposes a collection of Poisson-like precon- 
ditioners that are customized for efficient solution of the control problems (1) by a 
semi-smooth Newton (SSN) strategy [9]. Similar to a traditional Newton method, 
the SSN scheme solves (1) iteratively by finding a locally optimal solution to the 
non-linear Karuhn-Kush-Tucker (KKT) optimality conditions by solving a sequence 
of linearized, variable-coefficient subproblems. Direct solution of the subproblems 
15 often time consuming and requires considerable memory-allocation. To this end, 
the new preconditioners are designed to promote efficient solution of the SSN 
subproblems by appropriate Krylov subspace (KSP) methods. Following seminal 
ideas of Shen [16], the preconditioners rely on fast direct solvers for constant- 
coefficient problems that exploit (1) the structure of boundary-adapted spectral 
bases and (2) the separable nature of annular domains. As the main feature, inversion 
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of the preconditioners decouples to form to a sequence of independent 2 x 2 systems. 
This implies that the preconditioners can be applied matrix-free and scale linearly 
with the problem size. In addition, the independence of the 2 x 2 systems makes 
the preconditioners amenable to parallelization. To establish proof-of-concept, a 
numerical case study solves (1), where G(-) is given by a cubic non-linearity. 
The results demonstrate computational efficiency and show that the preconditioners 
respond well to different problem sizes, boundary conditions, point-wise bound 
constraints and various choices of the regularization parameter, р > 0. 

To establish the necessary background, Sect. 2 outlines how to solve the optimal 
control problem (1) using the SSN scheme. Further, to motivate the contributions 
of this paper, the section discusses some of the computational challenges that 
arise from discretization of the associated linearized subproblems. These challenges 
naturally leads to the construction of the new Poisson-like preconditioners in Sect. 3. 
Section 4 presents numerical results, while Sect. 5 draws overall conclusions and 
addresses future work. 


2 Motivation: A Semi-smooth Newton Method 


This paper solves the control problem (1) by a semi-smooth Newton strategy [9]. 
The SSN scheme seeks to generate a locally optimal solution, (у, и), by solving the 
first-order necessary optimality system 


-47+ GO) -H(p=0 in 9, (5a) 
-Ap+G,0)p+p0)=0 in 2. (5b) 


Here the boundary conditions of the original problem (1) are preserved, Gy denotes 
the Fréchet derivative of G with respect to the state variable, y, and the optimal 
control satisfies и = H(p) = max(u;, min(p-! p(x), иь)). In the special case 
Ола :— L?(Q2), it can be shown that 7 = H(p) — pp [18]. In the concrete 
case of annular domains (2), the system (5) can be recast to polar coordinates. To 
this end, define the functions 


Y (t,0) := y(r(t) соѕ(0), r(t) sin(@)), P(t,0) := p(r(t) cos(0), r(t) sin(0)), 


(6) 
where r(t) :— 574 (t 4- c), t e [-1, 1], c = pta, The optimality system then reads 
—AIY +кС(У) - kH(P) = 0 in Әк (7a) 


—A,P t kGy(Y)P + koy(Y) 20 in Эх, (7b) 
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where ДҮ = ((¢+0)¥id + qig Yoe), к = 00-00 and Фк = [71,1] x 
[0, 2л). To solve the ККТ conditions (7), the SSN scheme considers the system as 
an operator equation F(y, p) = 0 and solves it by generating a recursive sequence 
of iterates, x; :— (Yi, Pj), 1 € i < k, where the next iterate, хк+1 :— (Y, P), is 
found by solution of the linearized optimality conditions: 


—A,Y + Co(xx)Y — Ci(xy) P = fx) in Q, (8a) 

—A,P + Сох) P + Co(xy)Y = g(xy) in Q. (8b) 

Here Со(хк) := kGy(Yk), CiG) := KHp( Py), С2(хк) := k(Gyy (Y) Pk + 
pyy (Yx)) and 

f (Xk) := k (Gy (Yk)Yk — G (Yk) — (Hp (PX) Pk — H (Pk))), (9a) 

g (х) := k(Gyy (Yk) РҮ + pyy (Yn) Ye — py (Y). (9b) 


where H, denotes the generalized Newton derivative of H with respect to the 
adjoint variable, P, i.e., 


if ua < IP Xp, 


1 [1 
m (10) 
P 10 otherwise. 


2.1 Numerical Challenges: Discretization of the SSN 
Subproblems 


As a numerical challenge, the SSN scheme relies on successive solution of coupled 
PDEs in the form (8). Upon discretization, this leads to repeated solution of 
large saddle-point problems. To illustrate the associated difficulties, consider a 
spectral-Galerkin discretization of the linear subproblems (8). To this end, define 
the boundary-adapted approximation spaces 


Ух := (v € Py : av(ED--bv' (+1) 20), Fy := span(e ^, М/2 <k < M/2-1). 
(11) 
Let К := N - M and define Sx :— Vy x Ем. The discrete Galerkin approximation 
of (8) then seeks to find Y, P € Sx such that 
(t+ с)Ү,, ш) + (@ ++ с) !Yo, vo) + (CoY — СТР, v) = (fv) Vv € Sx, 
(12a) 


(t -- c) Bj, vi) + (it + с) Ро, vo) + (CoP + С2У, и) = (g,v) Vv € Sk, 
(12b) 
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2л 1 
where (v, ш) := [ vw 4146. То represent the approximate solutions, Ум. м 
о Л 
and Ру м, consider the truncated series expansions 


M/2-1 N-2 M/2-1 N-2 
Үүм(.0):= У? Уһ (е, Pum, O) = XO У рону? (ей, 
k=—M/2m=0 k=—M/2m=0 


(13) 
where /(k) := k + м. Now, define the (N — 1) x (N — 1) matrices associated with 
the basis {ук} у: 

aij = ((c + tV. Wi), А = (aij)i,j—-0..N—2. (14) 
bij = ((с+ 0 Yj, Vi), В = ijij-o.N-2- 15) 
Note that appropriate choices of the basis functions idco. € V, will be 


constructed in Sect. 3. Further, let Г and & denote the M x M diagonal matrices 


defined by 
Bias т) = 2литдтл, (16) 


Утв = (етО, е = 2 mn, Emn = тп(е"©, ü 


where ômn denotes the Kronecker delta. Finally, consider the (MN x 1) vectors 


Ў:= 6»... 9u-0.5k = (Уа, (17) 
P := (Bo,..., Ba-1), Pe = (Ри, (18) 
G :— (8o... £u, Be = Ug; ve "Оу, (19) 
Р (Паи) = ye А (20) 


The discretized linear subproblem (8) can then be written in matrix form 


В + Mc, —Mc, Р FJ 
bÉ—— d _ «2 
A x b 


where В = Г @ А + £ @ В. Неге the matrices Mc,, £ = 1, 2, 3 are defined by the 
elements 


(тсе) = (Сике"О, pre), (22) 
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where i, j satisfy that 


т=п(М—1)+@+1), jem(N-1 -(k-cl, (23a) 
O0<k,l<N-2, O<n,m<M-1. (23b) 


3 New Poisson-Like Preconditioners 


As a significant challenge to the numerical solution of (7), the SSN scheme relies 
on repeated solution of saddle-point problems (21) of dimension 2(N — 1)M x 
2(N — I) M. Consequently, direct solution strategies often become computational 
intractable. As a cost efficient alternative, the following introduces new precondi- 
tioners that seek to accelerate the inner SSN subproblems (8) by using appropriate 
Krylov subspace methods to solve the associated preconditioned linear systems 


Py xy = Рр by. (24) 


Concretely, this paper proposes approximative constraint preconditioners of the type 


Py = ls Ме, БВ | | (25) 


Following ideas of traditional Poisson preconditioners, the new preconditioners 
are constructed by approximating each block of the SSN subproblem (21) by the 
matrices, В and Me, £ = 0, 1, 2, that come from a spectral Galerkin discretization 
of the corresponding constant-coefficient problem that determines Y, P € Sy such 
that 


(CaY;, vi) + (Св, ve) + (CoY — C1P,v) =(f,v) Уо є Sx, (26a) 
(CA Py, vi) + (Св Po, vo) + (CoP + СҮ, v) = (8,0) Vv € Sx, (26b) 
where CA — c, Cg — E and C; — 5 max С; (xy) + min C; (x) |, = 
с2 — 1 Q Q 
0, 1,2. 


To be efficient, the new preconditioners crucially rely on carefully chosen basis 
functions {бу for the discrete approximation space, Ум (11). To this end, this 
paper uses Fourier-like (FL) bases that were originally introduced by Shen and 
Wang in the context of traditional initial-boundary-value problems [17]. As a key 
property to construction of the preconditioners, the FL bases lead to diagonal mass- 
and stiffness matrices, 1.e., 


Ми = (Wj, Vi))ij = Ауд, Sij = COW; дул} = 9j. (27) 
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The FL bases can be constructed as part of an offline preprocessing stage in two 
steps: 


1. Let UC a be the Legendre polynomials. Then there exists a unique set of 
coefficients (ay, be вт such that 


bk = ck (Le + ag Lz + Бо) € Мо, ck = (y —bk(4k + ay 


Furthermore, the mass matrix, МА = ((ф;, ф:)):;, is penta-diagonal and 
symmetric positive definite, whereas the stiffness matrix, Sa = ((9.фу, 0xi))ij, 
becomes diagonal [15]. In the concrete cases of Dirichlet and Neumann boundary 
conditions, the coefficients, (ак, bog are given by respectively 


ак = 0, by = —Тап4аах = 0, bo = 1/2, by = —k(k + 1)/((k + 2)(k + 3)). 
(28) 
2. The second step computes the diagonalization A = О” МАО, where О = (qij) 


denotes the matrix of eigenvectors and {м} г. are the associated eigenvalues. 
Using the matrix Q, the FL basis can be constructed by the linear combinations: 


N-2 


Wax) = у аф 0), OS KS N-2. (29) 


j=0 


3.1 Efficient Inversion of the Preconditioners 


As the main feature of the preconditioners, P, the following describes an efficient 
inversion procedure that exploits the orthogonal structures of the FL bases (27). To 
this end, consider the following preconditioning problem that is solved during each 
iteration of the KSP method: 


Mo B+ Ме, || 5% |6 (30) 
B+ Мс —Mc, a Е“ 
——MM—MÓ ee ——— 


Ру Zk AX 
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Note that (30) corresponds to the discrete first-order necessary optimality conditions 
associated with the constant-coefficient optimal control problem (26). Hence, by 
definition (22), it follows that 


В=СГ®$+ СВЕ ®М, Mc, = Cr & M, (31) 


where Sij = ((0rvj, 0;Wi))i; and Mi; = (Wj, Wi))i;. Further, by the orthogonal 
properties of the Fourier bases (16), the matrices, Г and £, are diagonal. Therefore, 
using the notation, 


^k ak |1N-2 ж ak \N-2 Fk ^k \N-2 Fk Tk 1N-2 
УГ = {Yim}m=0° р = Ри: Gi = {Gim} m=0> Е = {Fim }in=0> 


it follows that the preconditioning problem (30) can be written as M independent 
linear systems 


27x CoM У я бї 
5 = pu А О</і < Мм – 1, 32 
| E „о [р FK aids pa 


where X; := CAS + (Cgk(D? + 27x Co)M. In addition, the properties of the FL 
basis, ПА implies that S and М become diagonal (29). Hence, the system 
(32) reduces to M(N — 1) independent 2 x 2 linear systems in the form 


D AK pt 
2mCoÀm Onm Yim | = Grm O<I<M-1,0<m<N-2 
Onm —2л Ст Pi, Lm Ес Е Е 


where oj, := Ca + (СвК@)? + 2л Co)Àg. Ву (33), it follows that the original 
preconditioning problem (30) decouples into (N — 1)M independent 2 x 2 
subsystems. As a consequence, the Poisson-like preconditioners (25) scale linearly 
with the problem size and can be applied matrix-free. 


4 Numerical Results 


To investigate the potential of the Poisson-like preconditioners, the following case 
study solves the control problem (1), where the reaction term is given by the cubic 
non-linearity С (у) :— y?. The corresponding problem serves as a recurring example 
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in the control literature [18]. In this case study, the goal is to track the desired state 
of the type 


Hoes Z, (r,0) И [a, 8] х [0, л/2] U [л, л/3] l (34) 
0, otherwise 


where а < a < В < b. The following example uses the parameters, Z = 
4, а = 30, a = 40 and В = b = 60. The main purpose of the study is to 
investigate efficiency and robustness of the preconditioners (25). To this end, the 
study solves (1) for different choices of (1) problem size, (2) boundary conditions, 
(3) regularization parameter, and (4) point-wise bound constraints of the type (3).1 
As а benchmark reference, the results are compared to MATLABs state-of-the-art 
direct solver. All computations are carried out in [11] on a 2.9 GHz Intel processor. 
The SSN scheme is said to have converged when the 2-norm difference between 
successive iterates is below rj = 10-4. The KSP iterations are performed using the 
MATLAB function GMRES with a tolerance of € = 10-9. The direct solver relies 
on MATLABs backslash command. Table 1 lists the results, where KSP iter 
denotes the average number of KSP iterations required for each SSN step. Note 
also that DOF denotes the number of degrees of freedom for each individual SSN 
subproblem. Hence, the total degrees of freedom, DOFT, is therefore given by 
#SSN steps x DOF. The results reflect some overall tendencies that generalize 
to other choices of the parameters, Z,a, с, В and b. Firstly, the preconditioners 
provide significant reductions in CPU-time compared to the direct strategy. In 
particular, the results show that the non-linear control problem with up to DOF; = 
875,000 unknowns can be solved in less than a minute using modest hardware. 
Secondly, the preconditioners prove robust with respect to the problem size and 
the choice of boundary conditions. Thirdly, as a drawback, the number of SSN 
steps and KSP iterations increase as the point-wise bounds become more strict. The 
authors suspect that these increases in SSN steps and KSP iterations are caused by 
the combination of a decrease in regularity of the solution and an increase in non- 
linearity of the KKT system (Fig. 1). 


! By the choices of parameters, the study strives to provide a representative example of the general 
tendencies of performance and robustness that can be expected from the preconditioners. To allow 
for more diverse and elaborate experiments, the MATLAB source code of this study has been made 
publicly available from https://github.com/LHCH-DK/PDE Control Annular git. 
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State - Dirichlet BCs State - Neumann BCs Disired State - 2, 

4 4 p 

3 3 3 4 

м 2} N? E 3 
1 1 1 

2 
0 0 0 

-100 ` -100 100 : 

0 0 0 ^ 

Y 100 100 100 
x 100.100 0 х 100_ 0 x 100.1 0 
у 100 у 00 7 


Fig. 1 The computed states for (1) Dirichlet boundary conditions, (2) Neumann boundary 
conditions and (3) the desired state for ua = —35, up = 35, р = 107^ . Note that both solutions 
manage to approximate the desired state well, despite of the bound constraints 


5 Conclusions and Outlook 


This paper has proposed new Poisson-like preconditioners for semi-linear PDE- 
constrained optimization problems with non-linear reaction kinetics and point-wise 
bound constraints. The preconditioners specifically target problems in annular 
domains. Inspired by [16], the new preconditioners exploit the orthogonal prop- 
erties of customized, boundary-adapted spectral bases. This leads to matrix-free 
preconditioners that scale linearly with the problem size. Numerical results have 
demonstrated that the preconditioners lead to fast solution of large-scale opti- 
mization problems with significant computational benefits compared to MATLABs 
state-of-the-art direct methods. Furthermore, the preconditioners have proven to 
be robust with respect to the problem size for both homogeneous Dirichlet and 
Neumann boundary conditions. As a challenge, numerical experiments indicated 
that the non-linearity of the problem increases as the point-wise bound constraints 
become more strict. In turn, this leads to an increase in the number of SSN steps 
and KSP iterations that are required to reach convergence. A future study seeks to 
improve this situation by providing the SSN scheme with an educated starting guess 
that uses a coarse-grid solution to a similar control problem with less restrictive 
constraints. 
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DIRK Schemes with High Weak Stage A) 
Order ш 


David I. Ketcheson, Benjamin Seibold, David Shirokoff, and Dong Zhou 


1 Introduction 


Runge-Kutta (RK) methods achieve high-order accuracy in time by means of 
combining approximations to the solution at multiple stages. An s-stage RK scheme 
can be represented via the Butcher tableau 


Ст а11 +++ ais 
СА : 
т = 
b Су |а51 dss 
bı daa bs 
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Throughout the whole paper we assume that c — Ae, where e is the vector of all 
ones. The scheme's stability function [12] R(¢) = 1+ tb? (I — tA) le measures 
the growth и"! /u" per step At, when applying the scheme to the linear model 
equation u'(t) = Au, with 6 = X At. 

A particular interest lies in the accuracy of the RK scheme for stiff problems, 
1.е., problems in which a larger time step is chosen than the fastest time scale of the 
problem's dynamics. A standard stiff model problem [8] is the scalar linear ordinary 
differential equation (ODE) 


и = и — (0) +ф (0), (1) 


with i.c. и(0) = Ф(0) and Вел < 0. The true solution y(t) = $(t) evolves on an 
O (1) time scale. Hence, A-values with large negative real part result in stiffness. 
Considering a family of test problems (parametrized by A), one can now establish 
the scheme's convergence via two different limits: (a) the non-stiff limit At — 0 
and ¢ — 0; and (b) the stiff limit At — 0 and с — —oo. A characteristic property 
of most RK schemes is that, while the non-stiff limit recovers the scheme's order 
(as given by the order conditions [2, 5]), the error decays at a reduced order in 
the stiff limit. This phenomenon is called "order reduction" (OR) [1, 3, 7, 10, 11] 
and it manifests in various ways for more complex problems, including numerical 
boundary layers [6]. The OR phenomenon can be seen by studying the RK scheme 
applied to (1). The approximation error at time #,--1 reads [12, Chapter IV.15] 


etl = R(¢) є" Е clu Em pA T 4. grt | (2) 


where R(¢) is the growth factor, and 


+1 — Ati i) pC TM Ati T.j-1 1 j 
87 =), гт TOPO), g- TEST: (b gle 5) b(t) 
jz2 jzl 


are the truncation errors incurred at the intermediate stages and at the end of the step, 
respectively. Here, 9? denotes the j-th derivative of the solution, and the vectors 


т = Aci! — уе? ‚ )=1,2,... 
we call the stage order residuals or stage order vectors. The condition т ® = 0 for 
0 < n < j appears often in the literature and is also referred to as the simplifying 
assumption C(7) [12]. In (2), the step error 8"*l is of the formal order (in Ar) 
of the scheme (due to the order conditions). Moreover, the growth factor carries 
over (more or less, see [4]) the accuracy from one to the next step. Hence, the 
critical expression for OR is the term involving the stage error ó rt Specifically, 
the asymptotic behavior of the expression 


gP = cb — cA) Te? (3) 
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matters. In the non-stiff limit (7 < 1), a Neumann expansion yields ¢ (Г — tA)! = 
CI C AF CHA? +... , leading to expressions b 7 A*v ©) with £ > 0. And in fact 
the order conditions guarantee that b" Aft ( = 0 for 0 < £ + j < p — 1 to ensure 
the formal order of the scheme. 

Conversely, in the stiff limit we can treat ¢~! as the small parameter and expand 
t(I—tcA)y = АЛЕ 1А) = НАИВ ТА — £72 A7... , leading 
to expressions b? Afr (Л with £ < 0. The order conditions do not imply that these 
quantities vanish, and in general one may observe a reduced rate of convergence. 

A key question is therefore whether additional conditions can be imposed on 
the RK scheme that recover the scheme's order in the stiff regime. A well-known 
answer to the question is: 


Definition 1 Let p denote the order of the quadrature rule of an RK scheme. Let à 
denote the largest integer such that т) = 0 for 1 < j < 4. The stage order of a 
ВК scheme is а = тіп(р, 4). 


Having stage order q implies that the error decays at an order of (at least) q in the 
stiff regime (see also [12]). This work focuses particularly on diagonally-implicit 
Runge-Kutta (DIRK) schemes, for which A is lower diagonal. A known drawback 
of DIRK schemes is that they cannot have high stage order: 


Theorem 1 77e stage order of an irreducible DIRK scheme is at most 2. The stage 
order of a DIRK scheme with non-singular A is at most 1. 


Proof Since с = Ae, we have t® = anci — 1(с1)? = Кап). Thus if A is 
non-singular, one has т 2 Æ 0, so q < 1. Consider now the case that a1; = с = 0, 
and suppose that the method has stage order 3. The conditions £^ = т? = 0 
then imply a21 = a22 = c2 = 0, which would render the scheme reducible. Hence, 
q < 2. o 


Hence, while DIRK schemes possess an implementation-friendly structure (each 
stage is a backward-Euler-type solve), their potential to avoid OR by means of high 
stage order is limited. We therefore move to a weaker condition that can avoid OR 
in some situations for higher order in the context of DIRK schemes. 


2 Weak Stage Order 


To avoid order reduction, the expressions g/ in (3) need to vanish in the stiff limit. 
In line with [9], we define the following criteria: 


Definition 2 (Weak Stage Order) A RK scheme has weak stage order (WSO) q if 
there is an A-invariant subspace that is orthogonal to b and that contains the stage 
order vectors т ? for 1 < j < 4. 
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Theorem 2 (WSO Is the Most General Condition that Ensures g'/ = 0 for АП 
с > 0) Let coefficients A, b be given. Then 8) = 0 forall € > Oand 1 < ј < 
if and only if the corresponding RK scheme has weak stage order д. 


Proof Let C(G) denote the column space of 
G := [ro Ат UD, APR АО Ат? cg 4179 Я 
From the Cayley-Hamilton theorem it follows that WSO 4 is equivalent to 
b'A 0-0 O<e<s-11<j<@. (4) 


Because C (G) is A-invariant, C(G) is invariant under multiplication by (1 — 
СА) 1, i.e. if v € C(G) then for any ¢ > 0, the product (1 — ¢A)~!v Е C(G). 
Since b is orthogonal to C(G), we have 27) = 0 forall 1 < j < 4. 

If g0) = 0, then z^! gD = b'(1— zAy 119) = 0 forall z > 0. 
Differentiating both sides of this equation £-times, with respect to ¢, and taking 
the limit as ¢ — 0+, yields the conditions in Eq. (4). О 


Definition 3 (Weak Stage Order Eigenvector Criterion) А ВК scheme satisfies 
the WSO eigenvector criterion of order ge if for each 1 < j < qe, there exists и; 
such that Ar ©? = ит ©) and moreover, b? t  — 0. 


The WSO eigenvector criterion of order ĝe implies WSO (of at least) де. For a given 
scheme, let p denote the classical order, q the stage order, and 4 the weak stage 
order. Then we have а > q and p > 4. Note however that a method with WSO 
4 > 1 need not even be consistent; order conditions must be imposed separately. 
The WSO eigenvector criterion may serve to avoid OR because it implies that 


80 — tb" сир" = 3 P wf. 

= би) 
i.e., it allows one to “push” the stage order residuals past ће matrix (1 — ¢A)~!, 
and then use b^ c ©) = 0. Note that the condition b? c ©) = 0 that is required 
in Definition 3 is actually automatically satisfied (due to the order conditions) if 
р > Ge (огр > де for stiffly accurate schemes). 

It must be stressed that the concept of WSO (both criteria) is based on the linear 
test equation (1), hence it is not clear to what extent WSO will remedy OR for 
nonlinear problems or problems with time-dependent coefficients. In Sect. 4 we 
numerically investigate some nonlinear test problems. 

Finally, we present a limitation theorem on the WSO eigenvector criterion. 


Theorem 3 DIRK schemes with invertible A have де < 3. 


Proof Because the т ©? only depend on A, the eigenvector relation in Definition 3 
depends only on A, not on b. With A lower triangular, the first k components 
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of v / depend only on the upper k rows of A; and the same is true for the 
eigenvector relation as well. Hence, for a scheme to have an A that allows for 
the WSO eigenvector criterion of order ĝe, all upper sub-matrices of A must admit 
the same, too. We can therefore study A row by row. The first component of т ©? 
equals (1 — Daj 1» Which is nonzero for j > 1. Hence, the first row of the equation 


Ат) =u jt ( is equivalent to u j = аи. With that, we can move to the second 
row of the equation, which reads 


| i с | 
(1—5)a1421 + (a22—a11) (а аз + (421 a22) Ttan — 321-422)! =0. 


(5) 


To determine the set of solutions (а, a21, a22) of (5), we first observe that (5) is 
homogeneous, i.e., if (a11, a21, a22) solves (5), then (ua11, j4a21, мазэ) solves (5) 
as well for any u Е R. It therefore suffices to consider the solutions of (5) in the 
2D-plane GH, 2). Figure 1 shows the resulting solution curves for j € {2, 3, 4}. 

One class of solutions lies on the straight line of slope 1 passing through (1, 0). 
Those schemes are equal-time methods, i.e., RK schemes that have с = ve, where 
v є Ris a constant. In fact, equal-time schemes satisfy the eigenvector relation for 
all j. However, they are not particularly useful RK methods, because—among other 
limitations—they are restricted to second order. This follows because the order 1 
and 2 conditions require ЬТе = 1 апа с = 2. Thus v = 1, andb7c? = v? = 1, 
which contradicts the order 3 condition b 7с? = +. Note that the equal-time scenario 
also covers the points at infinity in Fig. 1, i.e., the schemes with a»; = 0. 


. order 2 & order 3 * order 2 & order 3| 
= order 2 = order 2 
8 [| — orders 0.8 " — order 3 
order 4 order 4 
6 0.6 
4 0.4 
2 0.2 
& Y d 
X0 “ы 0 
Я p ES 
2 е -0.2 
4 -0.4 
6 -0.6 
8 -0.8 
-10 | 1 | 1 | | | 1 | -1 1 | | | 1 | 1 | 
-10 -8 6 4 -2 0 2 4 6 8 10 -1 -0.8 -0.6 -0.4 -0.2 0 02 04 0.6 08 1 
а/а а, 1/8, 


Fig. 1 Curves of WSO orders 2, 3, and 4 as functions of the re-scaled parameters z and e Left 
panel: scale 10; right panel: scale 1. All orders are satisfied along the line of slope f going through 
(1,0), corresponding to equal-time DIRK schemes. Moreover, there are two further points (other 
than the origin), where orders 2 and 3 are satisfied. Neither of these two points satisfies order 4 
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Non-equal-time schemes that satisfy (5) for j — 2 and j — 3 are the following 
two points in the (#1, 22) plane: Pj = (—4 + 3/2, М2 — 1) = (0.2426, 0.4142) 


421° 01 
and P» = (—(V2+ 1)(V2 +2), —(/2+1)) = (8.2426, – 2.4142). None of these 
two points satisfies (5) for j = 4 (green curve in Fig. 1). Therefore 4е < 3. О 


Among the two sets of solutions found in the proof, Ру implies that a11, a21, and 
425 all have the same sign, which is a desirable property. In contrast, P» implies that 
a2, < 0. Both WSO 3 schemes presented below correspond to the Р] solution. 


3 DIRK Schemes with High Weak Stage Order 


Imposing the classical order conditions [2, 5], together with the WSO eigenvector 
relation (Definition 3), we determine RK schemes by searching the parameter 
space of DIRK schemes (with all diagonal entries non-zero). A stiffly accurate 
structure (b? equals the last row of A) is imposed, as is A-stability (verified by 
evaluating the stability function R(Z) along the imaginary axis). Together this 
implies that the resulting scheme is L-stable; i.e., it ensures that unresolved stiff 
modes decay [5]. The number of stages is chosen so that the constraints admit 
solutions. The optimization itself is carried out using MATLAB’s optimization 
toolbox, using multiple local optimization algorithms included in the function 
fmincon. An effort was made to minimize the Lz norm of the local truncation 
error coefficients. However, in multiple cases the solver exhibited bad convergence 
properties; so while the schemes below yield reasonable truncation errors, it should 
not be expected that they are optimal. We find an order 3 scheme with WSO 2 (see 
also [9]), 


0.01900072890 0.01900072890 

0.78870323114| 0.40434605601  0.38435717512 

0.41643499339 | 0.06487908412 —0.16389640295 0.51545231222 

1 0.02343549374 —0.41207877888 0.96661161281 0.42203167233 
0.02343549374 —0.41207877888 0.96661161281 0.42203167233 


an order 3 scheme with WSO 3, 


0.13756543551| 0.13756543551 

0.80179011576| 0.56695122794  0.23483888782 

2.33179673002 | —1.08354072813 2.96618223864 0.44915521951 

1 0.59761291500 —0.43420997584 —0.05305815322 0.88965521406 
0.59761291500 —0.43420997584 —0.05305815322 0.88965521406 
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and an order 4 scheme with WSO 3, 


0.079672377876931 | 0.079672377876931 0 0 0 0 0 
0.464364648310935| 0.328355391763968 0.136009256546967 0 0 0 0 
1.348559241946724 | —0.650772774016417 1.742859063495349 0.256472952467792 0 0 0 
1.312664210308764 | —0.714580550967259 1.793745752775934 —0.078254785672497 0.311753794172585 0 0 
0.989469293495897 | —1.120092779092918 1.983452339867353 3.117393885836001 —3.761930177913743 0.770646024799205 0 


1 0.214823667785537 0.536367363903245 0.154488125726409 —0.217748592703941 0.072226422925896 0.239843012362853 
1 0.214823667785537 0.536367363903245 0.154488125726409 —0.217748592703941 0.072226422925896 0.239843012362853 


4 Numerical Results 


In this section we verify the order of accuracy of the schemes above and demonstrate 
that WSO remedies order reduction for linear problems. We confirm that WSO p 
is required for ODEs, and WSO p — 1 is required for PDE IBVPs. In addition, we 
study the effect of WSO for two nonlinear problems. 


4.1 Linear ODE Test Problem 


We consider the linear ODE test problem (1) with the true solution ф (7) = sin(t + 
ah the stiffness parameter A = — 10^, and the initial condition и(0) = sin(4). 
The problem is solved using three 3rd order DIRK schemes (with WSO 1, 2, and 
3) and two 4th order DIRK schemes (with WSO 1 and 3)! up to the final time 
T = 10. The convergence results аге shown in Fig. 2. In the stiff regime where | | = 
|A| At > 1, first order convergence is observed for the WSO 1 schemes as expected, 
the WSO 2 scheme improves the convergence rate to 2, and the WSO 3 schemes 
exhibit 3rd order convergence. In addition to yielding better convergence orders in 
the stiff regime, the schemes with higher WSO also turn out to yield substantially 
smaller error constants in the non-stiff regime (At < 1/|A|). For comparison, we 
also display a DIRK scheme with explicit first stage (EDIRK), that is, а11 = 0, 
of stage order 2 (see Theorem 1). The left panel of Fig.2 shows that the WSO 2 
scheme exhibits the same convergence behavior as the stage order 2 EDIRK scheme 
and performs equally well in terms of accuracy. 


4.2 Linear PDE Test Problem: Schrodinger Equation 


As a linear PDE test problem, we study the dispersive Schrodinger equation. The 
method of manufactured solutions is used, i.e., the forcing, the boundary conditions 
(b.c.) and initial conditions (1.c.) are selected to generate a desired true solution. The 


'We do not construct an order 4 scheme with WSO 2, as we see no role for such a method. 
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Fig.2 Error convergence for linear ODE test problem (1). Left: 3rd order DIRK schemes with 
WSO 1 (blue circles), WSO 2 (red triangles), WSO 3 (black squares), and a 3rd order EDIRK 
scheme with stage order 2 (light red dots). Right: 4th order DIRK schemes with WSO 1 (blue 
circles) and WSO 3 (red triangles) 
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Fig. 3 Error convergence for the Schrödinger equation using 3rd order DIRK schemes with WSO 
1 (left) and WSO 3 (middle), and a 4th order DIRK with WSO 3 (right) 


spatial approximation is carried out using 4th order centered differences on a fixed 
spatial grid of 10,000 cells. This renders spatial approximation errors negligible and 
thus isolates the temporal errors due to DIRK schemes. The errors are measured in 
the maximum norm in space. 

We consider 


Ut = их for (x,t) e (0,1) (0, 1.2], и= on {0, 1}х (0, 1.2], (6) 


with the true solution u(x, t) = e/**—90, @ = 2x and k = 5. Figure 3 shows the 
convergence orders of u, их апаи х for 3rd order DIRK schemes with WSO 1 (left), 
WSO 3 (middle) and a 4th order DIRK scheme with WSO 3 (right). For IBVPs, 
spatial boundary layers are produced by RK methods, thus limiting the convergence 
order in и to q + 1, with an additional half an order loss per derivative when 4 < р 
[9]. As a result, the 4th order WSO 3 scheme recovers 4th order convergence in u 
and improves the convergence in их and uxx. When 4 = р, the full convergence 
order in и, их and ихх is achieved, as seen in the middle panel in Fig. 3. 
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Fig. 4 Error convergence for the viscous Burgers’ equation using 3rd order DIRK schemes with 
WSO 1 (left), WSO 2 (middle) and WSO 3 (right) 


4.3 Nonlinear PDE Test Problem: Burgers? Equation 


This example demonstrates that WSO avoids order reduction for certain nonlinear 
IBVPs as well. We consider the viscous Burgers’ equation with pure Neumann b.c. 


и + uly = Vuxx + f for (x,t) є (0,1) х (0,1], Ш —h on (0,1) x (0, 1]. 
(7) 


Here v = 0.1 and u(x,t) = cos(2 + 107) sin(0.2 + 20x). The nonlinear implicit 
equations arising at each time step are solved using a standard Newton iteration. 
The choice of Neumann b.c. distinguishes this example from the one given in [9]. 
With Neumann b.c., the convergence order in и is limited to 4 - 1.5 (half an order 
better than with Dirichlet b.c.). Figure 4 shows that order reduction arises with the 
stage order 1 scheme, and that the WSO 2 scheme recovers 3rd order convergence 
for и and их, and the 3rd order WSO 3 scheme yields 3rd order convergence for и, 
их and ихх. 


4.4 Stiff Nonlinear ODE: Van der Pol Oscillator 


This example illustrates that DIRK schemes with high WSO may not remove order 
reduction for all types of nonlinear problems. Consider the Van der Pol oscillator 


x 2y and у= и(1- х2)у-– х, (8) 


with i.c. (х(0), у(0)) = (2, 0), stiffness parameter и = 500, and final time Т = 10. 
The nonlinear system at each time step is solved via MATLAB’s built-in nonlinear 
system solver. The “exact” solution is computed using explicit RK4 with a time 
step At = 10-5. In this case, the presented DIRK schemes with high WSO do not 
improve the convergence rates in the stiff regime and they perform worse than the 
WSO 1 scheme in terms of accuracy (see Fig. 5). On the other hand, an EDIRK with 
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Fig. 5 Error convergence for Van der Pol’s equation. Left: 3rd order DIRK schemes with WSO 1 
(blue circles), WSO 2 (red triangles) and WSO 3 (black squares). Right: 4th order DIRK schemes 
with WSO 1 (blue circles) and WSO 3 (red triangles), and a 3rd order EDIRK scheme with stage 
order 2 (black squares) 


stage order 2 improves the rate of convergence in the stiff regime (see right panel 
in Fig. 5). However, it does so, interestingly, by yielding larger errors for large time 
steps. 


5 Conclusions and Outlook 


This study demonstrates that it is possible to overcome order reduction (OR) for 
certain classes of problems in the context of DIRK schemes, even though these 
are limited to low stage order. A specific weak stage order (WSO) “eigenvector” 
criterion has been presented, analyzed, and applied to determine DIRK schemes 
with WSO up to 3. The numerical results confirm that the schemes avoid OR for 
linear problems and for some nonlinear problems in which the mechanism for order 
reduction is linear (1.е., boundary conditions). The key limitation found herein is that 
the eigenvector criterion cannot go beyond WSO 3 for DIRK schemes. Hence, a key 
question of future research is how high WSO is admitted by the general criterion 
in Definition 2. Another important future research task is to devise further DIRK 
schemes that are truly optimized in terms of truncation error coefficients or other 
criteria. 
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1 Introduction 


This work presents a numerical algorithm for the system of the Navier-Stokes 
equations coupled with the balance of internal energy 


Ov 1 
р (Rtv. vv) = -Ур+ uev ри +A (У + fy (1a) 
д 
0 а. (ру) = т (1b) 
ot 
M ee VE SV ev Te (1c) 
En Ы ^ RePr x fr, " 


where v = [u, v, v]? is the velocity vector (by setting w = const. = 0 we restrict 
to 2D problem), p is a variable related to the thermodynamic pressure,! T denotes 


the temperature, D = 1 [vv + (vv) is the symmetric part of the rate of strain 


'We call thermodynamic pressure the variable acting in the equation of state, e.g. p = рАТ for 
ideal gas. Quantities with physical units (superscript star) are normalized by its farfield values 


(subscript infinity), e.g. v — RED Т = L , etc. The dimensionless pressure in (1a) is p — 
pr 

px vs] 
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tensor, constant Re is the Reynolds number and constant Pr is the Prandtl number 
(for sake of simplicity we set Re — Pr — 1 for the testing on exact solution). 

The fluid is expected to be (calorically) perfect? Newtonian,’ whose heat flux 
obeys the Fourier law.^ In system (1), we consider those fluids, which become 
nonhomogeneous in variable temperature fields due to temperature dependence of 
its material parameters, namely the density o = p(T), dynamic viscosity и = и(Т) 
and thermal conductivity к = «x (T). 

Instead of (1a), we solve 


p(Z +v vv) = -vz uev pur $a i| es (2) 
дї Ке 3 

where р = p — up V · vis mean or mechanical pressure, while up = А + in is the 
bulk viscosity. Equation (2) has the same structure as (1a) while setting А = — и (ог 
equivalently up = 0, c.f. Stokes hypothesis), but physical interpretation of pressure 
changes. 

Without loss of generality, solving (2) instead of (1a), we avoid specification of 
the second viscosity coefficient À. 

The forcing terms f, , fr, may represent action of volumetric forces, e.g. gravity 
or viscous heating, but т is set zero in most of realistic situations. In case of testing 
of our algorithm on a given solution [ve, pe, Te]', we construct the forcing terms 
such, that Eqs. (2), (1b) and (1c) are satisfied. 

Our computational scheme is developed for simulations based on the spec- 
tral/hp element approximation in spatial coordinates. We use the polynomial 
approximations of degree 15 in our tests, what eliminates the numerical error in 
spatial coordinates and we are getting an overview of error production, which 
belongs directly to the algorithm/discretisation in time. The high order spatial 
approximations also naturally include approximations of higher-order derivatives, 
what is utilized in the scheme. 

The previous results from literature are, up to the authors knowledge, restrictions 
of (1) setting at least one of the material parameters constant, the velocity field to be 
divergence-free or modelling a stationary flow, see Table 1. 


"Internal energy e of the calorically perfect fluids obeys e = cy T, where specific heat at constant 
volume is independent of temperature (cy = const.). 

5We use the term Newtonian fluid in a general sense for fluids, whose stress tensor is linearly 
dependent on the strain rate tensor. However, the viscous part of the stress tensor is not traceless as 
often expected if fluid is called Newtonian. 


"The Fourier law relates the heat flux q to the thermal conductivity к and the temperature gradient 
УТ asq = —кУТ. 
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Table 1 Chosen results 


| | Eq. type |V-v |н |k p 
concerning equation systems T "um 0 иббс дас 
with variable material = 
parameters [4, 5] nonst. 0 const. | const. | var. 

[6] nonst. 0 var. const. | const. 
[9] | stat. 10 ШТ) к(Т) |const. 
[10] | nonst. 0 и(Т) |к(Г) | const. 
[11, 12] stat. 0 и(Т) |к(Т) | const. 
[13] nonst. #0 |const. | const. | var. 

[14] nonst. 0 и(Т) |к(Т) | const. 
[16] nonst. 0 и(Т) |к(Г) | const. 


Stationary and non-stationary models are denoted stat. 
and nonst., unspecified variability of a property is 
denoted var. 


2 Algorithm 


Our approach is inspired by the velocity-correction scheme with the high order 
pressure boundary condition (HOPBC) proposed for the incompressible Navier- 
Stokes equations in [7]. The constant property case, [7], is widely used for its 
efficiency and was already extended to problems with variable viscosity in [6]. 
Its modification was used also to the incompressible Navier-Stokes-Fourier system 
with temperature dependent viscosity and thermal conductivity in [10]. Efficiency 
of the approach comes from the implicit-explicit IMEX) formulation, which allows 
decoupling of the system. 

The main contribution of the present work, which is a continuation of [10], is 
in extension to the problems with temperature dependent density. However, the 
velocity divergence cannot be further neglected in the momentum balance, what 
is the substantial difference from the previously discussed models and algorithms. 


2.1 Decoupled System 


We use the IMEX scheme in which the Backward difference formula (BDF) of order 
Q approximates the temporal derivative and a consistent extrapolation is applied to 
chosen terms (N) 


0-1 0-1 
ди IMEX YUnt1 — 225-9 CqUn-q 
3; = 20) + Ми) —> = = Ды + У ва. 
q=0 


(3) 


In (3), u is the searched solution, £ denotes the terms solved implicitly, which we 
expect to be constant in time. Subscript n + 1 (or operator in square brackets with 


468 J. Pech 


subscript) denotes evaluation at time tn+1 = to + (n+ 1) At, where At is the discrete 
time step. Coefficients ©] с: { A ж. and y for particular О can be found, e.g., 
in [10]. Hence ar ward, we use “ж” in the superscript to denote extrapolation, № = 
Q- 
[ NJ": = 24 =0 ‚Вам n-q: . . . 
The extrapolated terms are evaluated using data from previous time steps, 


UNn-q jn and {Un—q jx what allows separate/decoupled solution of the (gener- 
alized) Navier-Stokes equations (2)-(1b) and the non-linear energy equation (1c). 
Solution during one time step may be summarized to the scheme 


1. Update џ, к, р, V - v, and HOPBC using already known values {У„— ds 2; А 


2. Solve the system of momentum and mass balance 


(a) Solve the pressure-Poisson equation for Pn+1 
(b) Solve velocity-correction for Vn+1 


3. Solve the non-linear advection-diffusion problem for Ти-1. 


2.2 Balance of Momentum and Mass 


The scheme decouples solution of the Navier-Stokes system (2)-(1b) to the 
pressure-Poisson equation and an elliptic equation for velocity. The equation for 
pressure is derived as a projection to the irrotational space by application of the 
divergence operator to (2) 


2, [80] _ MES : 
vend ([#] тп] gs У (V x V x v)] 


1 1 AT 
+9. [xe + дк [Ун (vem )| 


1 2 " 4 " 
t—|-ZzUIVAYIV Уи + ИМУ Уи | + fi 
Re 3 3 


(4) 


where we applied (1b), identities V x V x v= VV.v— V2v and V. Vx = 0, 
ду/ді was substituted by BDF and Y = 08у 2 Og Yn-q — At [v - Vv]*. The temporal 


derivative of the density, which is extrapolated in (4), is approximated by Q-th order 
BDF 


Г = Ean, (5) 


дг |, At 
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We denote the extrapolation of the derivative approximation by superscript “жж”. 


Note, that we have to specify the initial value EA or both oo, p_ to initialise the 
scheme of the lowest order Q — 1. 

Our model assumes, that the density is entirely determined by the temperature 
distribution. Then, the divergence of velocity, whose forward estimate, [V · V],+1, 


15 required in (4), follows from (1b) 


1 К 9o 1** 
[V yhy © p* = = [у. Ур] — Ed | . (6) 


The forward estimate of velocity divergence is the crucial step in the proposed 
scheme. 

HOPBC is the natural boundary condition for (4). It is derived as projection of 
the momentum equation (2) to the direction of normal n to the domain boundary д2 


* 


др ду | 1 
SE зү, I^ M + Е ТЕ (и хУху+Уџ: [уу + ww] 
e 


1 2 * 4, 
TRe 73 [Ми] [У Уи + 3“ У [У -vlny | + £o 


(7) 


The forward estimate of velocity divergence follows from (6) again. Similarly to 
(4), we approximate the acceleration term oy by the BDF of Q-th order, whose 
initialisation requires value ED or both the values vo and v_;. The problem of 
initialisation of E and E is circumvented in many realistic simulations, 
which begin from a constant fields. 

The solution of (4) gives estimate/prediction of ри--1 and we can solve (2) as an 
elliptic problem for v,+1. However, in the case of temperature dependent viscosity 
and density, the algebraic system derived for operators with variable coefficients has 
time dependent matrices, whose direct solution is inefficient. To preserve efficiency 
of the scheme, we split such operators to the time independent part, which is solved 
implicitly using a direct method and a variable part, which is extrapolated together 
with the non-linear terms. We introduce material properties in form 


=) =А+ш, k-k(T) =н, m GG (8) 
p(T) p P/i 


where р and k are time-independent, while ш; = ш; (х, t) and к; = к; (х, t). The 
variable density р = p(T) acts in our scheme as an inverse value, c.f. (10), so the 
splitting is done accordingly. 
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To demonstrate the splitting, we consider the second order operator with variable 
viscosity and density 


M [nen ew] = С)" [ау] + (=) у. 2079 | 
+ 


у. [ш уут]. (9) 


Only the term with time independent operator (2) У: | m УТ] is solved implicitly, 


while we apply extrapolation to terms containing variable parameters i and ш. 
І 


This approach is valid for = (х), к = K(x), resp. (i) E (1) (x), but if ji 


is constant in space, the constant operator simplifies to (£)vv (resp. V?v if the 


properties are normalized to u = к = р = 1, what is the case of (1), the balance 
equations in form independent of physical units). 
The final form of the equation for velocity becomes 


y p 
V?v — ——Rev = 
п+1 At п+1 


p Ya, 1 = тү|* 
a) Rear + oe | Re Patt — ино - [Vu [Vv + (Vy) 1] 


а n | 
+ 31701 [V Уи 3^ VIV - У] 


TE 1 
— (VIV Уи - IV x V x р 2 () hát 
(VIV Уи — [V x V x v] ЈЕ о) 


(10) 


2.3 Balance of Energy 


The energy equation with temperature dependent thermal conductivity is strongly 
non-linear. We split the diffusion operator to the time independent and the variable 
part, following the technique shown for the velocity-correction (10). We set к = 1 
for simplicity and the discretized energy equation (1c) gets form 


^ 


y Re Pr T ж 
Такі = RePr erc -[V.« VT], (11) 


At 


VT — 
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here 7 = ELl oor, 4 — Ае |у: УТ". Operator (v? — ZBER" | is ti 
where = 2-9 94Tn-q M . Operator MC] is time 
independent and allow inversion using a direct method, what results in good 

performance in computations on long time intervals. 


3 Temporal Convergence on Manufactured Solution 
and Application 


Our convergence tests are based on the method of manufactured solutions, alterna- 
tive to estimates of numerical analysis on simplified system. This approach lacks 
generality, because we always restrict to particular data and some representative of 
the solution space, but we get rough convergence estimate for unrestricted equation 
system (1), while proving also the correctness of method implementation. 

As an exact solution, we take a smooth functions у, : Q x (0: T) - В", 
ре: Qx (0:T) > В, Te: Qx (0:T)—>R 


Ue 2 cos(zt x) cos(z y) sin(t) 
"| [ve |_| зах) зіпту) sinc) T 
т | pe | | 2sin(rx) зшОту) cos(t) 
e Te sin(x)sin(y)cos(t) 


and derive the forcing terms fy, т and fr such, that Eqs. (2), (1b), (1c) are 
fulfilled (in all cases we set Re = Pr = 1). Divergence of velocity in (12) 
is V- Ve = —mzsin(zx)cos(sy)sin(t), variable in both the spatial and temporal 
coordinates and with amplitude comparable with the solution itself. We choose 
a computational domain Q = [0:2] x [0.5:2.5] consisting of two elements Q = 
[0:1] x [0.5:2.5] U [1:2] x [0.5:2.5]. Extent of Q and form of the exact solution, is 
inspired by [3], where the velocity-correction scheme of [7] was tested on a similar, 
manufactured solution. 

The incompressible Navier-Stokes equations define the pressure up to a constant 
value and only the boundary condition for velocity is needed. In this sense, we set 
the Dirichlet boundary condition for velocity on whole 952. However, the pressure- 
Poisson equation (4) requires setting a boundary condition as a consequence of the 
decoupling. We set HOPBC (7) at 0Q and solve the fully Neumann problem, which 
defines the solution up to a constant value, which we set by fixing the solution to zero 
in one of the grid points. The boundary condition for pressure is an artificial element 
of the computational scheme and its existence is related to the splitting error. The 
boundary condition for energy equation (11) is of Dirichlet type for whole д9. 

We present the first and second order schemes in time in the convergence tests. 
The technique is applicable to higher-order schemes as well. A multi step schemes 
use data from multiple time steps, what complicates its initialisation. We apply 
the first order BDF method for initialisation of the second order scheme. The first 


472 J. Pech 


order scheme needs data of only one backward time step, but the time step must be 
appropriately shortened. 

As mentioned already, the acceleration in HOPBC (7) and the term др їп (4) 
require an initial value or one other backward value for proper initialisation also in 
case of the first order scheme, what is in contradiction to standard initial conditions 
for system (1), which require only the initial values. However, setting the correct 
values for calculation of the first time step is crucial for the final accuracy of the 
solution. 

Finally, we trace appropriate norms of difference between the exact and com- 
puted solutions on a set of computations with time steps At = A7/2", At = 
0.2, n = 0,...,9 fort € [0: 1]. 

We use the power laws for approximation of dependence of material parameters 
on temperature 


и(Т) = (as + 1.0)", — &«(T) = (оТ + 1.0), p(T) = (оТ + 1.0)". 
(13) 


The temporal convergence of the above scheme Гог a, = oy = o, = 0.1, Bm = 
Êk = f, = 2 is shown in Fig. 1. 

A detail view of error production, Fig.2, shows, that the dominant error 
production arises at the grid point, which was used to set the unknown constant 
for the Neumann problem. 

The scheme was successfully applied in a 2D simulation of flow around the 
heated cylinder and the results were compared with experimental data [15], where 
the dependence of the vortex shedding frequency (Strouhal number St) on the wall 
temperature of the cylinder, Tw, was observed. Figure 3 shows the substantial 
difference in results between the model neglecting the thermal expansion, [10], and 
the present one. Fig. 4 shows value range and structure of velocity divergence in a 
chosen realistic simulation. 


Fig. 1 The temporal 10! 
convergence for the 109 
Navier-Stokes-Fourier system 
with temperature dependent 107! 
material properties. The 10-2 
number of steps in BDF А » 
О = 1 ог О = 2, is denoted я 107 
by subscripts 71 and 72. The Я 1074 
label “Errors” refers to _$ 
| 10 

Па = ae|| r,,, where a is the 
computed function and a, the 1076 
exact value from (12), at 10-7 
t=1 

107% 

104 We 10°? 10 100 


At 
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Fig. 2 Test on manufactured 
solution: difference 

V. err = ve — v at the final 
time ¢ = | for computation 
with Q = 2, At = 0.2/2’, 
c.f. Fig. 1. Polynomial 
approximation of degree 15 


Fig. 3 Frequency of the 
vortex shedding (Strouhal no. 
*St") as dependent on the 
normalized wall temperature 
Ty, in the flow around heated 
cylinder (Re © 121.2). 
Comparison of the data from 
[10] (р = const.) “const.”, the 
present scheme with Q — 1 
“р(Г)”, experimental data of 
[15] “exp.” and empirical 
formula *emp-"[8] 


Fig. 4 Computed field of 
divergence, div(v) = V · v, 
caused by the thermal 
expansion in the flow around 
heated cylinder (Re — 121.2, 
Tw / To; = 1.494), c.f. Fig.3 
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4 Conclusion 


The numerical scheme proposed for the Navier-Stokes-Fourier system with variable 
parameters allows to solve the highly complex mathematical model, which has an 
impact to understanding the processes connected with the heat exchange, transport 
and energy storage in fluids. 

The computational scheme for a fluid flows influenced by temperature as mod- 
elled by system (2), (1b), (1c) was developed and tested. The scheme was primarily 
constructed for spatial discretisations based on spectral/hp finite elements and 
presented results were obtained after implementation to the МеКаг++ framework 
[2], modified version 3.3. 

We did not impose restrictions to the type of functional dependency of the 
material parameters on temperature. Graph of error convergence in Log norm, Fig. 1, 
results from testing on a manufactured solution and shows a good convergence 
properties of the scheme, what is promising for applications. 

Considered model neglects compressibility in the sense of direct dependence of 
density on pressure, but the velocity field is not divergence free as a consequence of 
the thermal expansion. A forward estimate of velocity divergence is needed in the 
proposed scheme and its successful approximation is one of the main contributions 
presented in this work. For these reasons, the scheme is unique among numerical 
schemes based on the finite element approximations in space. 

Proposed scheme is an extension of the efficient semi-implicit solver for 
Incompressible Navier-Stokes system [7] and it is suitable for a fast and highly 
accurate simulations of problems on long time intervals. The present results inspire 
implementation of high order BDF schemes and extension of the solver to 3 spatial 
coordinates. 

Derivation of the scheme includes a number of sub-steps, whose detail descrip- 
tion is beyond the scope of this article and will be published separately, together with 
extension of the scheme for energy equation with variable density and further testing 
of performance as dependent on various physical parameters in the equations. 

Also the results from application of the scheme to computations of a physically 
realistic problem and comparison of its results with experimental data exhibit a good 
coincidence and will be presented with detail description in a separate article. 
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Implicit Large Eddy Simulations A 
for NACA0012 Airfoils Using Chente; 
Compressible and Incompressible 
Discontinuous Galerkin Solvers 


Esteban Ferrer, Juan Manzanero, Andres M. Rueda-Ramirez, 
Gonzalo Rubio, and Eusebio Valero 


1 Introduction 


High order Discontinuous Galerkin (DG) methods provide accurate solutions by 
enabling arbitrarily high polynomial approximations inside each grid element. For 
high order polynomials, the numerical errors are not distributed along all wave- 
numbers but localised at high wave-numbers [1-5]. This characteristic of high order 
methods results in very accurate simulations with low dissipative and dispersive 
errors. Although this characteristic seems a-priori beneficial for well resolved 
simulations, when computing under-resolved Large Eddy Simulations (LES), it can 
prove difficult to obtain stable simulations. In implicit (or under-resolved) Large 
Eddy Simulations (ILES), the smallest numerical eddies are larger than would have 
been in a finer mesh, leading to numerical under-resolution (i.e. coarse grid or low 
polynomial order) and aliasing [6]. Various methods have been proposed to stabilise 
under-resolved computations with aliasing. Among others, split forms or skew 
symmetric variants [7, 8]), localised interior penalty fluxes [9], over-integration [10— 
12] or filtering [13] may be incorporated into the solver to stabilize the computations 
and remove or alleviate the aliasing. 

Contrarily to low order methods, high order methods do not have enough 
inherent numerical dissipation in under-resolved simulations, to dissipate large flow 
structures (when compared to Kolmogorov scales). Therefore, computation of ILES 
flows using high order DG solvers require localised dissipative mechanisms to 
dissipate flow structures close to cut-off size. In what follows, we compare two 
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dissipative stabilising mechanisms that enable the simulation of turbulent under- 
resolved flows. On the one hand, we use a compressible formulation with an 
energy conserving split-form and dissipation through Roe fluxes [14]. On the other 
hand, the incompressible solver uses the viscous discretisation through interior 
penalty formulation to enhance stability [9]. We challenge both formulations with 
a NACAO0012 airfoil at various angles of attack in turbulent regimes, to explore 
both accuracy and stability. We compare simulated results to experimental data and 
simulations using low order methods (Xfoil and Ansys-Fluent). 


2 Methodologies 


We first introduce the two different mechanisms used to stabilise both compressible 
and incompressible high order DG formulations. The explanation included here is 
brief and aims only at introducing the fundamental concepts and motivating ideas. 
Further details can be found in the following references by the authors [9, 14]. 

The 3D Navier-Stokes equations can be written as: 


idv Foe. (1) 


where и is the vector of conservative variables и = (р, ру, pv», pva, pe) in 
compressible solvers. For incompressible solvers и = (v1, v2, v3)" and Eq. (1) is 
complemented with V - и. Details on the definition of inviscid and viscous solvers 
can be found in [9, 14]. To derive discontinuous Galerkin schemes, we consider 
Eq. (1) for one mesh element el, multiply by a locally smooth test function фу, for 
0 = j x P, where P is the polynomial degree, and integrate on el: 


| oti | Feo ЕД7 0) 


We can now integrate Бу parts the inviscid fluxes, F e, integral to obtain а local weak 
form of the equations (one per mesh element): 


| u$; + f Fe- nó; — f Fa- Уфу = | У. 2,4, (3) 
el де el el 


where n is the normal vector at element boundaries ðel. We replace discontinuous 
fluxes at inter-element faces by a numerical inviscid flux, F7, to obtain a weak form 


for the equations for each element, 


[90+ | Fia; | Fe ve; = | v. Е.Ф. (4) 
el де el el 
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where, we have omitted the fluxes at external boundaries, for simplicity. This set of 
equations for each element is coupled through the inviscid fluxes F7 and governs 
flow behaviour. Note that one can proceed similarly and integrate by parts the 
viscous terms (see [9, 15]), but here for simplicity we retain the volume integral. 


[ntf Fin o-f re Уфу [ству o 
el del —— el elo mM 


Riemann solver Viscous term 


The non-linear inviscid and viscous terms that can be discretised to control 
dissipation in the numerical scheme have been underlined. 

Riemann solvers are the classic option to include numerical dissipation in 
DG schemes [16, 17], since they naturally arise when discretising the non-linear 
terms. Comparison of different fluxes for homogeneous turbulence can be found in 
[14, 18]. A different option is to modify the viscous terms to enhance its dissipative 
properties. The latter has been proposed in [9] using an increased penalty parameter 
(compared to the minimum required to ensure coercivity of the scheme) when 
discretising the viscous terms using a interior penalty formulation. 


2.1 Compressible DGSEM Solver 


The compressible solver uses conservative variables to solve the Navier-Stokes 
equations. We use a particular nodal variant of DG methods: the Discontinuous 
Galerkin Spectral Element Method (DGSEM), see for example [19]. In addition, 
the compressible formulation is modified to be energy preserving [20]. The required 
split-form necessitate Gauss—Lobatto points to cancel out boundary terms using the 
summation-by-parts simultaneous-approximation-term property (SBP-SAT). The 
interested reader is referred to [5, 20-22]. These energy conserving schemes 
are designed to remain stable and energy conserving and consequently do not 
necessitate additional localised numerical dissipation. Nonetheless, in this work we 
introduce dissipation through Roe fluxes, to enhance robustness at high Reynolds 
numbers. Additionally, viscous terms are discretised using the Bassi-Rebay 1 (ВКТ) 
scheme, which is equivalent to the interior penalty formulation when using Gauss- 
Lobatto points and hexahedral elements [23]. Let us note that this formulation for 
the viscous fluxes is neutrally stable [24] and adds the minimum dissipation required 
to achieve a stable scheme, whilst others may introduce some extra dissipation. 
Other techniques are available to discretise second order derivatives and can be 
found in the classic review by Arnold et al. [15]. 
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2.2 Incompressible DG-Fourier Solver 


Flow solutions of the incompressible Navier-Stokes equations, are obtained from 
the 3D unsteady high order h/p Discontinuous Galerkin-Fourier solver [9, 25— 
28]. The solver uses a second order stiffly stable approach to discretise the NS 
equations in time whilst spatial discretisation is provided by the discontinuous 
Galerkin-Symmetric Interior Penalty formulation with modal basis functions in 
the x-y plane. Here, x represents the streamwise flow direction and y is the 
normal direction. Spatial discretisation in the z-direction (here defining the spanwise 
airfoil length) is provided by a purely spectral method that uses Fourier series and 
allows computation of spanwise periodic three-dimensional flows. Since high order 
methods (e.g. discontinuous Galerkin and Fourier) are unable to provide enough 
numerical dissipation to enable under-resolved high Reynolds computations (e.g. as 
necessary in Large Eddy Simulations), we have adapted the original laminar version 
of the solver to increase (controllably) the dissipation and enhance the stability in 
under-resolved simulations [9]. This dissipative formulation has minimal impact on 
well resolved flow regions and its implicit treatment does not restrict the use of 
relatively large time steps, thus providing an efficient stabilization mechanism for 
Large Eddy Simulations. The solver has been widely validated for a variety of flows, 
including bluff body flows, airfoil and blade aerodynamics and vertical axis turbines 
under static and rotating conditions [9, 25—30]. 


3 Numerical Results 


This section considers a NACA0012 airfoil at Re = 1 х 10%, Re = 1 x 10? and 
Re = 1 x 106 (based on the airfoil chord c) for a range of Angles of Attack (Ao A): 
0? < AoA x 10°. In what follows we compare incompressible and compressible 
simulations using polynomial orders P — 3 and P — 4. The averaged values have 
been computed after the development of three dimensional flow. The compressible 
solver uses a hexahedral mesh with 18,000 elements, which for P 23 and 4 result 
in 1.1 and 2.2 million degrees of freedom. The incompressible solver, uses a mixed 
tri-quad 2D mesh and is expanded using Fourier in the homogeneous third direction 
(here 16 Fourier modes). Depending on the angle of attack, the resulting meshes 
include 0.6 to 1 million degrees of freedom. Meshes for the two solvers and for 
AoA = 0° are depicted in Fig. 1. Finally, all the simulations are computed with both 
DG solvers and consider a periodic spanwise lengths of L,/c = 0.1. Note that we 
have not observed significant differences in the results when increasing the spanwise 
length. Statistics are accumulated during at least 40 convective time scales (based 
on the airfoil chord) and starting after the turbulent flow has developed (typically an 
initial transient of 10 convective time scales). 
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Fig. 1 Meshes for NACAO0012 airfoil: (a) Hexahedral mesh for compressible solver and (b) mixed 
tri-quad mesh for incompressible solver. Inset figures show high order polynomial mesh for order 
Р=4 


Ве=1.000.000 Ве=1.000.000 Ве=1.000.000 
АоА = 0 deg AoA = 5 deg AoA = 10 deg 


contrours of velocity: [0.85; 1.2] 


Fig. 2 NACA0012 airfoil at Re = 1 x 106, from left to right: AoA: 0?, AoA: 5? and АоА: 10°. 
Simulations are obtained using the incompressible DG solver 


3.1 Ке = 1 x 10° and Various Angles of Attack 


We start by illustrating the highest Reynolds number case, which is the most 
challenging in terms of stability and robustness. To illustrate the range of the flow 
behaviour at various AoAs, we show in Fig.2, velocity contours for AoA: 0?,5? 
and 10?, computed using the incompressible DG solver. It can be seen that at 
Re = 1x 106 the flow remains attached for all angles, and that only mild separation 
is seen near the trailing edge. We will see in the next section that at lower Reynolds 
numbers this is not necessarily the case. 

Figure 3 compares the aerodynamic coefficients with experimental data for 
various angles of attack and the two solvers. Figure 3a shows the lift coefficient 
against the AoA and Fig.3b depicts the Lift-Drag Polar for Re = 1 x 10°. We 
Observe very good agreement with experimental data for both solvers. 
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Fig. З NACA0012 airfoil at Re = 1 x 106: (a) Lift coefficient vs angle of attack and (b) Lift- 
Drag Polar. Compressible (comp.) and incompressible (incomp.) DG simulations are compared to 
experimental data sets of Ladson [31], Gregory and O'Reilly [32], Abbot and Von Doenhoff [33] 


3.2 AoA = 5° and Various Reynolds Numbers 


Having shown the overall good performance in terms of aerodynamic quantities 
at the most challenging Reynolds numbers, we now focus our attention on the 
angle AoA = 5° and compare the usability of the solvers to study the NACA0012 
boundary layer evolution. 

First, we compare the aerodynamic coefficients for AoA — 5?, and Reynolds 
numbers Re = 1 x 10° and Re = 1 x 106, using the incompressible and 
compressible solvers, both with polynomial order P 23 and P=4, in Table 1. We 
observe good agreement for the highest polynomial order. Small discrepancies are 
attributed to post-processing of statistics and lack of near wall resolution when 
using Р=3, which influences mainly the drag coefficient and particularly viscous 
drag. For completeness, we depict the flow evolution within the boundary layer 
using both solvers in Fig. 4. It can be seen that detachment near the trailing edge 
is similar for both solvers. Regarding transition to turbulence (represented by 
fluctuations in velocity contour), both solvers capture transition on the suction side. 
The compressible solver shows a transition location near the maximum thickness 


тува Маю airfoil — ЕЕЕ [e-lx1. 
t = 5° | | | ] 
Scd at aod а а са 
Re=1x 106 DG comp. P=3 _ | 0.588 | 0.028 | 0.567 | 0.005 - 


DG comp. P=4 | 0.575 | 0.025 | 0.558 | 0.008 - 
DG incomp. P=3 | 0.484 | 0.028 | 0.538 | 0.017 
DG incomp. P=4 | 0.545 | 0.018 | 0.551 | 0.007 - 


Comparison of Lift and Drag using the DG com- 
pressible and DG incompressible solvers and two 
polynomial orders P=3 and P=4 
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Fig. 4 МАСАО012 airfoil at Re = 1 x 10° and AoA = 5° for P=4: (a) Compressible DG solver. 
(b) Incompressible DG solver 
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Fig. 5 NACAO0012 airfoil at AoA: 5? for (a) Re = 1x 10+, (b) Re = 1x10? and (©) Re = 1x106. 
Velocity magnitude isocontours and unstructured mesh details are included 


(x/c = 0.4), whilst the incompressible solver shows transition closer to the leading 
edge (x /c ~ 0.2). We have observed significant variations of the transition location 
for the compressible solver when varying the polynomial order, that we have not 
seen in the incompressible solver. Further studies are necessary to completely assess 
the influence of discretisation in the transition location for the two solvers. 

Second, we explore the pressure coefficient distribution along the airfoil profile 
when varying the Reynolds number. We only depict results for the incompressible 
DG solver since these are very similar to the results provided by the compressible 
solver. Note that this is not surprising, since the lift coefficients at Re — 1 x 10? 
and Re = 1 x 106 are very similar for P=4 at AoA = 5°, see Table 1. Figure 5 
shows velocity contours for Re = 1 x 10%, Re = 1 x 10? and Re = 1 x 10° at 
AoA = 5°. It can be seen that for the lowest Reynolds, the boundary layer remains 
laminar until it detaches after the maximum thickness, showing a highly unsteady 
wake. When the Reynolds number increases, the boundary layer shows transition 
to turbulence before the maximum thickness, as appreciated by the fluctuations and 
small scales appearing in Fig. 5. 

To quantify these results, we depict in Fig. 6, the pressure distribution (Cp) for the 
three Reynolds numbers. In the top row, we show instantaneous Cp against averaged 
for incompressible DG solver. In the bottom row, we compare mean Cp distributions 
against Xfoil [34] (with critical N-factor Ner = 1) and Fluent SST (fully turbulent 
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simulation) [35]. At Re — 1 x 10^, the top figure shows that the boundary layer 
detaches before transition occurs and after the maximum thickness, as shown by the 
velocity contours in Fig. 5. Since the flow detaches leading to a highly unsteady 
wake, there is little hope that the averaged Cp captures the actual behaviour of 
the boundary layer. This is why, in the bottom figure, the mean values obtained 
using the incompressible DG solver do not agree with the mean Xfoil and Fluent 
values that assume steady turbulent flow. At Re = 1 x 10° and At Re = 1 x 10°, 
the instantaneous Cp values (top row) show scattering in the data associated to 
transition. This occurs close to the leading edge on the suction side, whilst it is 
delayed towards the trailing edge on the pressure side. The bottom row shows that 
the DG results compare very well to Xfoil when using a critical N=1 (to set the 
transition point close to the leading edge), whilst Fluent SST (fully turbulent) shows 
lower Cp values associated to simulating the complete boundary layer as turbulent 
(no laminar region). This results suggest that DG solvers using iLES approaches 
(compressible and incompressible) can capture transitional behaviour in boundary 
layers even when relatively coarse meshes are selected. 


4 Conclusions 


In this contribution, we have presented results for turbulent flows over a NACA0012 
airfoil. High order discontinuous Galerkin formulations require localised dissipation 
to remain stable for under-resolved turbulent flow conditions, often referred to as 
implicit Large Eddy Simulations. Here we have presented compressible and an 
incompressible DG formulations (with different stabilising mechanisms) that are 
able to cope with high Reynolds number flows. Both DG formulations provide 
aerodynamic coefficients and boundary layer information that compare favorably 
to experimental data and well established low order solvers. We conclude that the 
compressible and incompressible formulations included in this work can be very 
useful in aeronautical applications. 
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SAV Method Applied to Fractional (8) 
Allen-Cahn Equation Chente; 


Xiaolan Zhou, Mejdi Azaiez, and Chuanju Xu 


1 Introduction 


The Allen-Cahn equation was originally introduced to describe the motion of anti- 
phase boundaries in crystalline solids [1]. There have been a large body of work on 
numerical analysis of Allen-Cahn equations (cf. [2—5] and the references therein). 
We aim in this paper to use the SAV scheme, recently introduced and analyzed by 
a number of researchers; see, e.g., [5] and the references therein, to approximate 
the solution of the fractional version of the Allen-Cahn model. It consists in finding 
ф: Q x (0, T] ^ R solution of 


$t Y(CAY6 FO) =0, VG.DeQx(0. T], 
Vé.n|lj = 0, Wee (0,Т] (1.1) 
ф( = 0) = фо(х), Vx EQ. 


In the above, y is a positive kinetic coefficient, s є (0, 1), Q С IR is a bounded 
domain, п is the outward normal, f($) = F'($) with a given function F($) = 
is — 1)? being the Ginzburg-Landau double-well potential. The phase field ф 
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is such that 


ja 1, phase 1, 
—1, phase2, 


and = represents the thickness of the smooth transition layer connecting the two 
phases, which is small compared to the characteristic length of the system scale. 
The homogeneous Neumann boundary condition implies that no mass loss occurs 
across the boundary walls. 

Among the different definitions of fractional Laplacians (see [6, 7] for a 
quantitative assessment of new numerical methods as well as available state-of-the- 
art methods for discretizing the fractional Laplacians problems), we choose in this 
paper to focus on the fractional spectral definition. It is defined by 


(= А)*и := у, aid; ei, 


ieN 


where Aj, е; are the eigenvalues and eigenfunctions of the Laplace operator — A in 
О with homogeneous Neumann boundary condition, i.e., they satisfy 


— Aei = ме, x €Q, 


Vei по = 0. 


While, а; represents the projection of и on the direction eji, а; = (и, ej) Li. The 
spectral fractional Laplacian is nonlocal on the interior for noninteger s € (0, 1). 
We see that to compute the inner product а; = (и, е;) 12, it suffices for и to 


be defined on the interior of ©. No information about и on the exterior А V Q 
is required. Thus, from a conceptual viewpoint, in boundary value problems the 
spectral fractional Laplacian can admit the same type of boundary conditions as the 
standard, local Laplacian —A. In this paper, we let Q =] — 1, 12. Set u(x,t) = 
ye Da 1 am,n(t)€m,n (x), where em,n are the orthogonal eigenfunctions of the 
Laplace operator with homogeneous Neumann boundary conditions and Aj, are 
the corresponding eigenvalues. Then we define the spectral fractional Laplacian as, 


оо оо 
(Aul, t) = У у А5, namn emn), 0 <5 < 1, Vu e Н" (9). 
n=l m=1 


(1.2) 


ЗАУ Method Applied to Fractional Allen-Cahn Equation 491 


Неге 


со со со oo 1/2 
H5 (Q) := L = Y Y amnemn €17(Q): lul :— b» у, €: < |. 


п=1т=1 п=1т=1 


The rest of this paper in organized as follows. In Sect. 2, we present briefly the 
spectral method by giving some notations and reminders. The fractional Laplace 
operator and its possible applications is discussed in Sect. 3. To demonstrate the 
applicability of the approximative fractional Laplacian for real applications, we 
consider a fractional Allen-Cahn equation (FACE). Based on the scalar auxiliary 
variable (SAV) approach, we construct an unconditionally second-order energy 
stable BDF scheme (SAV/BDF2) for FACE. We present numerical results for a test 
case as well as a benchmark example in Sect. 4. 


2 Spatial Discretizations 


We limit here the description of the spectral approximation to the introduction 
of some notations and reminders (see [8, 9]). For complex domain, we can use 
spectral element method [10]. Let X = {(&, oj); 0 < i < N} denote the sets of 
Gauss-Lobatto-Legendre quadrature nodes and weights associated to polynomials 
of degree N. These quantities are such that on A :—] — 1, +1[ 


H N 
Уф € Pon- (A), Г ФВ dé = 6) pj. (2.1) 


j-0 


where IP y (A) denotes the space of polynomials of degree < №. We recall that the 
nodes &; (0 € i < М) are solution to (1 — ху, (x) = 0, where Ly denotes the 
Legendre polynomial of degree N. 

The canonical polynomial interpolation basis h; (x) € Py (A) built on X is given 
by the relationships: 


1 1 (1-2) (х) 


h(x) =, —1<х<+1, OSi<N, 
NIN +1) Ем) | (x—&) 
(2.2) 
with the elementary cardinality property 
h;(£j) = б, O<i, j <N, (2.3) 


where бу is Kronecker's delta symbol. 
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In the sequel the phase field ф will be approximated in space variable by suitable 
polynomial functions фм as follows 


N N 


præ D = У Уо Ohi Gh; O). (2.4) 


i=0 j=0 


The L?-inner products involved in the calculation will be achieved using Gauss- 
Lobatto-Legendre quadrature, which reads: for all continuous functions and у in 
Q, 


N N 


(р, Vo) = @ Wn = У УФЕ) (Е) ргру. (2.5) 


i=0 1=0 


3 Scalar Auxiliary Variable (SAV) Approach for FACE 


SAV approach was introduced in [4, 5] to solve gradient flows. The main purpose 
of this section is to construct efficient unconditionally stable scheme based on this 
approach for (1.1). 

Throughout the paper, we assume there exists a constant Со such that 
Ia Е(ф)ах + Co > 0. We first introduce a scalar auxiliary variable 


r(t) i= | f Е(ф) dx + Co. 
Q 


Then, we rewrite the phase-field equation (1.1) under an equivalent form as: find 
ф: (0, T] x О — R andr : (0, T] — R, such that 


9 = Ур, Уф -njaa = 0, 
= sġ 00) 
dr 


dr. l 2% 
dt — 2, Jf, Е(ф)йх-+С‹\ Ja FO) у ах. 


Theorem 3.1 /f 9 є L?((0, T], H*(Q)), 0 < s < 1, is the solution of equations 
(3.1), then we have the following energy dissipation law 


d 1 
E (^ + jet) = УС. (3.2) 
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Proof Taking the inner product of the first two equations with и, е respectively, 
and multiplying the third equation with 2r (t), then adding them together, We obtain 


d 
- Уи = zt + (— AY, dx). (3.3) 


Let $ (x, f) = У n=1 dmn (ети (X) and taking advantage of the orthogonality of 
{етп}, we verify 


= та ld 
5 5 2 
((-А)°ф, or) = ) Am nam,n (ат nt) = zs 3 A. nn, 20) = 3 zg 9l 


m,n=1 т,п=1 


(3.4) 


Then combining (3.3) апа (3.4) proves (3.2). Hn 


The energy law (3.2) means that the SAV approach (3.1) makes the modified 
energy 


1 
нф) =? +55 


decay in time. 
Now we construct a second-order ол scheme for the system (3.1). 


Given initial conditions $ = фо, and let r° = |J fo F($9)dx + Co, find 9"*! є 


H*(Q) and r"*! Е R, n = 1,..., such that 


3 п+1 — AQ" + n—l 
ee = ур" (3.5) 


pnt 


рі = (А) ф"! o M É— (3.6) 
rare 


apntl — 4r" + тп! 


(s^ 
2AI —— f 
2 fo F( $7) ах + Со 


In the above, ф”+! can be any explicit approximation of ф(1"+!) with an error of 
O(At?). For instance, we may choose the following one 


39"! = 4g” +g"! 
2At 


dx 


(3.7) 


Qr - 29" " 
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Theorem 3.2 The scheme (3.5)-(3.7) is unconditionally stable in the sense that 


1 Е - 
A; Gi [enn rth), (9^, 7)] НФ", r”), Ф", r") < =y lle” o, 
(3.8) 
with the modified energy 


НФ”, ret), (ф", r”)] = (wre | 2"! eg) | (erm $ Qr"! ny), 


2 
(3.9) 


Proof The result can be directly deduced from taking the inner product of the 

first two equations (3.5) and (3.6) with p+! and мыш respectively, and 

multiplying the third equation (3.7) with 2r”*!, then using the following identity: 
2(а&+!, Зак! — дак + а 1) =lat! |? + оа! Lal? + Пак! — зак + ар 


- Па? — [2а* — а 12. 


3.1 Implementation 


Besides its unconditional stability, а most remarkable feature of the above scheme is 
that it can be solved very efficiently. Indeed, by inserting (3.6) and (3.7) into (3.5), 
and let F”+! :— fo F(9"*!)dx + Co, we obtain 


3 5 п+1 (ФНО, otl intl, и 
(zio )» т ш P (3.10) 


where 


= = - 1 49" +o"! 
ro (oe 
2y At 3 / Fn 25H 


We shall first determine (f (ф"+1), g"t!) from (3.10). To this end we multiply 
—1 - 
(3.10) by (521 + ca) and take the inner product by f (ф”+!) to get 


) raD, (3.11) 


ОЕ 


imu FY, grt = quent. ann, 


(3.12) 


(f (9 * b, ot) + 
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with 
n+l __ 3 LANNS 2 n n+l _ 3 LASS в. anl 
а (ВСА) s) F-(LLueCAY) se"). 
(3.13) 
Then, we have 
4n4l п+1 
(уе СЛЕ (3.14) 


о ОО ВН) 


Thus we obtain an expression to compute ф"+! by bringing back (3.14) into (3.10): 


Q^ Lg Cf ($"*), otl) p, 


Е 2 Хп+1 + Cf (Q1), pnl) (3.15) 


Finally we compute r"*! through 


03 2 Fnit | 


We now summarize the algorithm of the Scalar Auxiliary Variable approach/Semi- 
Implicit Second-Order Scheme (3.5)- (3.7) as follows: 


. Set ф"+! = 29" —ф"71, ин zi Е(ф"+!)ах АЫ Со, 
Q 


_ Ar" — ph Čo 
Bo п+1 40" — n—l A ыы cuu 
со = Cf(9^* ),40 фо )/3, с ЗАИР 2n 


п—1 
Е 2y At 


3 2 

. Sol nal луз в" — п+1 ; 

о лла. fior! 
3 

. Solve pan + A = g"; | 

. Compute & = (/(ф"“!),8"%!), & = (ИФ, ant, 

&/ QJ" + 62); 


. Compute $71 = gt! — ggntl тм 


(E+), ф"+1) — čo 
IV Fn | 


— à f (9*5; 


Ar? — ү" 


3 
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4 Numerical Results and Discussion 


In this section, we first present a numerical example to illustrate the efficiency of the 
SAV scheme in terms of stability and accuracy. We then use the proposed scheme 
to simulate a benchmark problem. 


4.1 Test of the Convergence Order 


In order to validate the proposed SAV/BDF2 scheme for the fractional phase-field 
equation, we consider a fabricated forcing term so that the exact solution to (1.1) is 
ф(х, 1) = sin(t) cos(zt x) cos(x y). In this test we set y = 1, Q =] — 1, 1[2, and the 
nonlinear term is given by f($) = Ф (Фф? — 1). 

In the calculation we use polynomial degree 32 x 32 for the spatial discretization, 
which is large enough so that the spatial discretization error is negligible compared 
to the temporal error. Figure 1 shows the L?-errors at T — 1.0 in log-log scale as 
a function of the time step size for several fractional orders. It is observed from 
this figure that the convergence rate of the time stepping scheme is exactly second 
order as expected for all tested values of s. It is worthy to mention that no numerical 
instability was observed for all time step sizes used in the calculation. This implies 
that the proposed scheme is unconditionally stable. 


Fig. 1 L?-errors at T = 1.0 
in log-log scale with respect 
to the time step size At for 
different fractional order s 


log(L? - error) 
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4.2 Benchmark Test 


In this subsection, we apply the SAV/BDF2 scheme to the fractional version of a 
classical benchmark problem (cf. [11]) that we describe below. Our main purpose 
in this test is to demonstrate the applicability of the constructed method for the 
FACE. We are particularly interested in numerically investigating the impact of the 
fractional order on the evolution of the phase interface. 

At the initial state, there is a circular phase interface of the radius Ro — 100 in 
the rectangular domain ] — 128, 128[2. In other words, the initial condition is given 
by 


2 2 
$e 1, |x| i 100 я 
—], [| > 100°. 
Such a circular interface is unstable and the driving force will make it shrink and 
eventually disappear. It has been shown that in the limit that the radius of the circle is 
much larger than the interfacial thickness, the velocity and the radius of the moving 
interface are given (see [1]) by 


Ver = : R(t) = J R2 —2t 
^" dt R m 0 | 


In the implementation we map the computational domain ] — 128, 128[? to ] — 
1, 12. Therefore actually we are led to solve the fractional Allen-Cahn equation 
(1.1) with the coefficients у = 1/128? and ғ = 0.0078. In the simulation, the space 
resolution is set to N = 512, and the time step size is At = 0.1. The computed 
radius R(t) for s = 1 using the SAV/BDF2 scheme is plotted in Fig. 2. We observe 
that R(t) keeps monotonously decreasing and very close to the sharp interface limit 
value. This confirms the accuracy of the proposed method, at least in the case s — 1. 

Next we apply the proposed scheme to investigate the impact of the fractional 
order on the radius behavior. In Fig. 3 we present the numerical radius evolution 
for a number of the fractional orders. Specifically, Fig. 4 shows the circle shrinking 
for fractional orders s = 1.0, 0.9, 0.8. It is clearly indicated that the radius decay 
rate slow down when the fractional order decreases. However, for the time being 
the physical meaning and mathematical explanation of this phenomena remain 
unknown. We plan to address this issue in future work. 
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——— — ——— Exact Solution 
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Fig. 2 The evolution of radius R(t): comparison of the exact solution and numerical result in the 
case s = 1 
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Fig. 3 Evolution of the radius for different fractional order s: impact of the order on the radius 
decay rate 
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(c) s=0.8 


Fig. 4 Temporal evolution of a circular domain from left to right at times г = 
1000, 2000, 3000, 4000, 5000, for fractional order 5 = 1 (а),0.9 (b), 0.8 (c), for the top, middle 
and bottom rows, respectively 
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Alessio Scatto, Igor Tominec, and Pierre-Frédéric Villard 


1 Introduction 


When intensive care patients are subjected to mechanical ventilation, this is part of 
the life support. At the same time the ventilator causes damage to the muscles that 
govern the normal breathing. Normally, the muscles contract when we inhale, and 
air is pulled into the lungs. During controlled mechanical ventilation, the ventilator 
instead pushes the air into the lungs that then exert a pressure on the muscles. The 
function of the muscle tissue can deteriorate quite rapidly, leading to Ventilator 
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Induced Diaphragmatic Dysfunction (VIDD) [2]. Because of this, the rehabilitation 
process, including the weaning from the ventilator, is more difficult and takes longer. 

The Individual Virtual Ventilator (INVIVE) project [5] aims to study the 
mechanics of respiration through numerical simulation in order to learn more about 
the onset of VIDD, and the factors that influence its progress in a patient. This 
work is the first publication from the project and is a pilot study for the numerical 
techniques that we plan to use. 

The diaphragm is the main respiratory muscle. It has not been studied as much 
in the literature as other muscles, and not with detailed models. However, there 
are a few studies that uses continuum mechanical descriptions of the muscle tissue 
and simulate its behaviour using FEM [6, 10]. The main drawbacks of the FEM 
solvers are that they are time-consuming, and that meshing of complex geometries 
can be difficult. We instead propose to use a meshfree RBF-FD method [4] for the 
numerical simulation. Some of the potential advantages are that meshing can be 
replaced with scattered node generation, which in some respects is easier, and allows 
for a lot of flexibility; that it is easy to construct high-order accurate approximations 
that can reduce the computational cost; and that the method is easy to implement 
and modify, providing flexibility when performing experiments. The objectives of 
the paper are 


* to show the feasibility of using the RBF-FD method for this type of problems, 

* to work with real medical data such that the results will be relevant, 

* to investigate how the high aspect ratio of the geometry affects the simulation 
and if this can be mitigated by using high aspect ratio node sets. 


The paper is organized as follows: In Sect.2 we describe the linear elasticity 
equations in three dimensions. Section 3 briefly introduces the RBF-FD method. 
The process from medical images to input data for the simulation is described in 
Sect. 4, which is followed by Sect. 5 on Numerical experiments. 


2 The Elasticity Equations 


The constitutive relations that describe the real behaviour of muscle tissue are 
non-linear. The displacement of the diaphragm is large, and should therefore also 
be modeled by non-linear elasticity equations. For our final simulation tool, we 
aim to solve the fully non-linear equations. However, for the initial development 
of meshless numerical methods for the diaphragm simulations, we use a linear 
elasticity test case. 


2.1 The Linearized Equations of Motion 


For the linear test problem, the following simplifying assumptions are made: 
The relationship between stress and strain is linear, the material is isotropic and 
homogeneous, and displacements are small. 
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We define the displacement u(X) = (u1(X), и2(Х), u3(X))7 є R? ofthe tissue 
from the initial configuration X = (x, y, z)? € R? toa later configuration X* є R° 
as 


u(X) = X* — X. (1) 


The strain-displacement relationship for small displacements, || Vu|| < 1, has the 
form 


= У + (Уи)! |, Q) 


where the strain = € R?” is a tensor. For a linear material, the constitutive relation 
between the strain and the stress с є R?*? is characterized by the Lamé parameters 
A, and џи, leading to 


o = 2 ив + àtr (e)l. (3) 


In tissue mechanics, the acceleration is typically small compared with the forces, 
and can be neglected. The equations of motion (Newton's second law) can then be 
written as 


V-o+f =0, (4) 


where f є R? represents body forces. We assume that (4) holds for all points 
X Е О, where 2 is the domain of interest, which for our problem is the diaphragm. 
To close the problem formulation, we also need boundary conditions. The first type 
is displacement boundary conditions 


u=g, Хє 9р. (5) 
These are applied where the geometry is attached, for example where the diaphragm 
15 attached to the ribs and the spine. Traction boundary conditions are given in terms 
of the stress as 


o-n=h, Хє Әт. (6) 


These represent forces applied to the surface of the domain of interest, such as the 
pressure against the diaphragm from below generated by the abdominal compliance. 
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2.2 The Lamé-Navier PDE Formulation 


The Lamé-Navier equations gives the steady-state motion equation in terms of the 
displacement field [13]. This means we are solving a system of three PDEs with 
three unknowns. We rewrite (4) and (6) in terms of u using relations (2), (3), and the 


identity "(хи n (vu)") = 2(V - и) to get 


(А+ ш) (У i) - иуи+ f 20, ueQ (7) 
u=g, иєдОр (8) 
[xv -u)I + (Уи + (vu)")| n=h, иєдОт (9) 


When we later discretize the system, it is more convenient to work with the operators 
and the displacement in component form. The two operators in the PDE (7) applied 
to и expand to 


Ухх Уху Ухг\ fui £0 0Y fu 
У(У - и) = | Ул Vy У, | (u|; уи= | оло | | |, 
Ух Уус Vie] Vu 0 0 £) Vu 


where L = Уу, + Ууу + Vzz. Rewriting the two terms in the traction condition (9) 
in the same way yields 


nj Vx nı Vy ШУ. uy 
(V-u)I -n= | п2У, n2V, n2Vz | | иә |, 
пзУх n3Vy n3 V; u3 


Tix пох n3Vx | [u 
(Ми + (Уи)Г)-п = njVy Foy n3Vy и2 |, 
njV; n2V; 7з u3 


where Tig = ni Va + niVx + по Vy + n3V;. 


3 The RBF-FD Numerical Method 


In the RBF-FD method [4], scattered node stencil approximations are used for 
representing the differential operators in the PDE and the boundary conditions. Let 
X1, ..., Хм be a global set of node points, and let uj; ~ u;(X;). We collect the 
unknown displacement values in the vectors Uj = (uii, ..., Ui ni. When we want 
to approximate the result of a differential operator 2 appii to ui, we first find 
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a local neighbourhood X | ) — хб ) with local unknowns uP to the point Xj, 


where we want to evaluate the result. The stencil approximation then takes the form 


n 
Dui (Xj) ~ Уши). (10) 
К=1 


The weights are computed for each point in the global node set by solving a linear 
system of size n x n, where the stencil size n < N. In this work, we consider stencil 
approximations where RBFs augmented by a polynomial basis are used. The small 
linear systems then take the form 


(96)- 0: n 


where A(i, k) = ф(|Х? — XC? |, where ¢(r) is an RBF, for i, =1,..., М, and 
where P(i,k) = p(X), fori = 1,...,N and k = 1,...,m. The polynomials 
px are chosen as the lowest degree monomial basis with dimension m, and т is 
usually chosen such that a full basis for a certain maximum degree К is obtained. 
The right hand side vectors are defined by b(i) = Do (Xj — xu fori = 1,..., №, 
and c(i) = Dp;(X;) for = 1,..., т. The vector y can be seen as a Lagrange 
multiplier in this problem and is discarded. The stencil approximation is exact for 
polynomials up to degree K as can be seen from the last block row in the system, 
and it is also exact for the RBFs centered at the stencil nodes. 

A global differentiation matrix D is assembled by inserting the weights corre- 
sponding to X ; in the jth row of the matrix, and in the columns corresponding to 


the global indices of the nodes x ) in the local neighbourhood (X ; is normally one 
of the points in the neighbourhood). Then we can compute 


(Duj(X1),..., Dui(XN))T = РШ. (12) 


When solving the PDE problem (7)-(9), и is replaced with the discrete field 
variables, and the differential operators are replaced with the corresponding dif- 
ferentiation matrices. The PDE operator is applied for interior node points, and the 
boundary operators at boundary node points. 

In recent work on RBF-FD methods it has been found that a combination of 
polyharmonic spline RBFs $(r) = |r|?**!, k > 0 with polynomials up to degree 
K has excellent approximation properties [1, 3]. The (asymptotic) convergence rate 
is guided by the polynomial degree K, and oscillations near boundaries, which are 
common both with pure RBF and pure polynomial approximations, are suppressed 
as soon as K is large enough. In this work, we use the cubic polyharmonic spline 


$() = |r}. (13) 
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4 The Medical Image Input Data 


The medical research questions are the motivation for the INVIVE project, and it 
is important that the numerical simulations can emulate what is seen in the medical 
image data. To start with, we use medical images to extract the real diaphragm 
geometry. We also use image data to find the displacement of the diaphragm at 
different times during the respiratory cycle. Later in the project medical image data 
will also be used for validation of the numerical simulations. 


4.1 Medical Image Acquisition 


The type of medical image data that is available to us is thoracic 3-D CT images 
acquired using a TOSHIBA Aquilion ONE CT scan machine. The images were 
captured at Azienda Ospedaliera di Padova from adult patients that were subjected 
to the CT scan for medical reasons (the CT scans were not performed only for 
research). The images were made and are used in anonymous form. The computed 
3-D images are associated with two specific times in the breathing cycle or, 
equivalently, with two different states of lung inflation. The images have a pixel 
size of 0.927 x 0.927 mm? and a slice thickness of 0.3 mm. They have a resolution 
of 512 x 512 x 1500 that includes the thoracic and abdominal regions. Examples of 
image views are shown in Fig. 1. 


4.2 Converting Image Data to Mesh-Based Geometry Data 


Automated segmentation methods are currently not able to identify the diaphragm 
that is barely visible in the images. Therefore, the diaphragm was manually 
segmented on a Wacom tablet using a method similar to the description in [14]. 
The segmentation time is roughly 6h for one 3-D image. The manual segmentation 


ЗЕ 


Fig. 1 Manual segmentation of the diaphragm. Red: diaphragm, yellow: lungs, blue: bones 
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3D mesh 


Fig. 2 Left: The initial 3D-mesh and the decimated mesh with 1000 vertices. Right: The sagittal 
cut and centers of gravity (green) 


method consists in following the organs that are known to surround the diaphragm 
such as the bottom of the lungs, the top of the liver, and the inside of the ribs. 
Figure 1 shows the result of the segmentation. 

The labelized voxel data is then converted into a mesh with the marching cube 
algorithm. It contains around 1.5 · 106 vertices due to the CT scan resolution. The 
initial mesh is then decimated using Vorpaline [9], a fast and automatic method, 
where the only input is the number of final points comprising the mesh. The initial 
mesh and a decimated mesh are shown in Fig. 2. 

Both when implementing the boundary conditions and for node generation, it is 
necessary to be able to identify vertices belonging to different parts of the surface 
of the geometry. Two relevant sections are the upper thoracic surface and the lower 
abdominal surface. These correspond to two different pressure regions. 

To separate the surface components, we employ the following algorithm: First the 
whole diaphragm is separated into a left and right part. If we orient the diaphragm 
such that the parameter f € [fmin, fmax] describes а position from left to right, and 
we let V (-) denote the volume of a convex region, we let C(-) denote the convex 
hull of a node set, and we let Q (71, t2) be the part of the diaphragm that falls within 
that range of t. Then we can find the sagittal cut fsep as the position that maximizes 
the sum of the left and right volume 


һер = автах У(С({Х|Ху € (tmin, 01)) + (С ({Xj1Xj € 0.129) - 


fmin; €f пах 


The result is illustrated in the right panel of Fig.2, where also the two centres of 
gravity cz and ср, for the left and right part respectively, are indicated. 

For each surface vertex X; € Q;, for i = L, К, of the diaphragm, a vertex 
location tag is given by the dot product between the diaphragm vertex normal n j 
and the normalized vector v; = (X; — сг) ЛХ; — с | in the direction from the 
center of gravity to the vertex. 


thorax, ifn;-v;>0 
tag(X;) = e (14) 


abdomen, otherwise 
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Finally, to avoid artifacts, only the bigger connected component of tagged locations 
is kept and disconnected parts the are changed to the other location. 


4.3 Final Geometry Representation and Node Generation 


Based on the OGr method [11, 12] and a least-squares RBF-partition of unity 
method [7], the mesh-based geometry is smoothed and parametrized. The details 
of this process are described in a forthcoming paper [8]. 

Scattered nodes sets of different resolutions are generated from the smoothed 
geometry. A level set function inside the volume is used for anisotropic node 
placement such that the resolution in the direction normal to the surface is higher 
then along the surface. 


4.4 A Test Problem with Real Displacements 


We are still working with the analyses of the images shown in the previous 
section. Therefore, we use an older data set with a bit lower resolutions for the 
test case and the numerical experiments. We only have the end of inhalation 
state segmented at this point. To define a realistic displacement function, we have 
identified nine different landmarks on the diaphragm. There are four insertion 
points of the diaphragm that we take as immobile. These are the left and right 
transverse processes of the two lowest thoracic vertebra T11 and T12. The five 
moving landmarks and their displacements are given in Table 1. We augment this 
information by also requiring the extremal points of the lower edge of the diaphragm 
to be immobile, and the thickness change from contracted to relaxed state at the 
two domes to be 66%. We then interpolate the displacements at the augmented 
landmarks by the |r|’ polyharmonic spline. The initial and displaced states are 
shown in Fig.3. As the first test problem, we solve for the interior displacement 
given that the boundary displacement changes from the relaxed to the contracted 
state. 


Table 1 Displacements of five landmark points 


Right costophrenic Left costophrenic Xiphoid 


Right dome Left dome recess recess process 
uj — 1.08 1.08 —1.08 —2.16 0 
и2 4.32 4.32 0 —2.16 —1.08 


из —2.50 —7.50 —2.50 —10.00 0 
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Fig. 3 Initial node locations 
(higher, red) and displaced 
node locations (lower, blue), 
using the constructed 
displacement function for a 
node set with N — 8404 
nodes 


5 Numerical Experiments 


A main concern when solving the linear elasticity problem for the diaphragm is 
the high aspect ratio of the geometry. The overall size of the diaphragm is around 
30 x 20 x 15cm, while the thickness is just a few mm. In the experiments we want 
to test how important the resolution in the normal direction is for the results. Our 
hypothesis is that it needs to be large enough to allow for a stencil with a similar 
number of nodes in each dimension. That is, we need at least 2/n nodes in the normal 
direction. We compare two cases, (1) using uniform node sets with similar distances 
in the normal and tangential directions, and (ii) using node sets that are refined in 
the normal direction according to the stencil size. Convergence is tested against a 
reference solution computed at a higher resolution. 

The left part of Fig. 4 shows the convergence of the displacements. The errors are 
larger for case (1), and no convergence trend is observed for the largest stencil size. 
The number of points in the normal direction increases gradually as N increases. For 


Relative |,—error 


№13 


Fig. 4 Left: Convergence of the displacement against the reference solution for uniform nodes 
(dashed) and nodes refined in the normal direction (solid) for n — 50, K — 3 (square), n — 78, 
К = 4 (circle), and n = 120, К = 5 (x), where п is the stencil size and К is the order of 
the polynomial basis augmenting the polyharmonic spline functions. Right: Convergence of the 
stresses against the reference solution for n — 50. The slopes p, with — p indicating the order of 
convergence are also shown 
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case (ii), the errors are smaller, and convergence is observed in all cases. When using 
polyharmonic splines in combination with polynomials, we expect the convergence 
rate to be of order ^A *!, where h is a measure of the node spacing and К is the 
maximum degree of the polynomial terms [3]. However, for case (ii), we get the 
same rate of convergence for all К. One reason can be that the normal refinement 
is constant when the tangential refinement is increased. There may also be issues 
concerning the smoothness of the node distribution and/or the solution. 

In the right part of Fig. 4, we display the convergence of the functions in the 
stress tensor, computed for the interior nodes for case (ii). The convergence rates are 
similar to those of the displacement. This is also unexpected, as we would normally 
expect a derivative of order £ to converge as hk *!-* [3]. 

In Fig. 5, we show the components of the stress tensor, computed for the interior 
nodes for case (ii). We can see that the magnitude of the stresses is large at the 
domes where we enforce compression of the muscle. 


6 Conclusions 


We have developed a pipeline for converting CT image data into input data for 
numerical simulation. The main bottleneck is the manual segmentation of the 
diaphragm. One thing that will be investigated in future work is if a mapping from 
a reference geometry can be used to simplify this step. 

When the thin dimension is resolved with enough node points, the RBF-FD 
approximations converge as the number of nodes increase. Also the stresses can 
be computed with similar accuracy. This shows that it is possible to use this type of 
discretization, but further work is needed on how to generate smooth non-uniform 
node sets, and also on the implementation of more advanced test problems. 
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Fig. 5 The six components of the stress tensor evaluated for the interior nodes 
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1 Motivations and Objectives 


The DGTD method is nowadays a very popular numerical method in the compu- 
tational electromagnetics community. А lot of works are mostly concerned with 
time explicit DGTD methods relying on the use of a single global time step 
computed so as to ensure stability of the simulation. It is however well known 
that when combined with an explicit time integration method and in the presence 
of an unstructured locally refine mesh, a high order DGTD method suffers from a 
severe time step size restriction. An alternative approach that has been considered 
in [5, 7, 16] is to use a hybrid explicit-implicit (or locally implicit) time integration 
strategy. Such a strategy relies on a component splitting deduced from a partitioning 
of the mesh cells in two sets respectively gathering coarse and fine elements. The 
computational efficiency of this locally implicit DGTD method depends on the size 
of the set of fine elements that directly influences the size of the sparse part of the 
matrix system to be solved at each time. Therefore, an approach for reducing the size 
of the subsystem of globally coupled (i.e. implicit) unknowns is worth considering 
if one wants to solve very large-scale problems. 

A particularly appealing solution in this context is given by the concept of 
hybridizable discontinuous Galerkin (HDG) method. The HDG method has been 
first introduced by Cockbrun et al. in [4] for a model elliptic problem and has 
been subsequently developed for a variety of PDE systems in continuum mechanics 
[13]. The essential ingredients of a HDG method are a local Galerkin projection 
of the underlying system of PDEs at the element level onto spaces of polynomials 
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to parameterize the numerical solution in terms of the numerical trace; a judicious 
choice of the numerical flux to provide stability and consistency; and a global jump 
condition that enforces the continuity of the numerical flux to arrive at a global 
weak formulation in terms of the numerical trace. The HDG methods are fully 
implicit, high-order accurate and most importantly, they reduce the globally coupled 
unknowns to the approximate trace of the solution on element boundaries, thereby 
leading to a significant reduction in the degrees of freedom. HDG methods for the 
system of time-harmonic Maxwell equations have been proposed in [9, 10, 14]. 
We have only developed the implicit HDG method for the time-domain Maxwell 
equations [3]. In view of devising a hybrid explicit-implicit HDG method, a 
preliminary step is therefore to elaborate on the principles of a fully explicit 
HDG formulation. It happens that fully explicit HDG methods have been studied 
recently for the acoustic wave equation by Kronbichler et al. [8] and Stanglmeier et 
al. [15]. In [15] the authors present a fully explicit, high order accurate in both 
space and time HDG method. In this paper we outline the formulation of this 
explicit HDGTD, present numerical results including a preliminary assessment of its 
superconvergence properties. We adopt a low storage Runge-Kutta scheme [2] for 
the time integration of the semi-discrete HDG equations. This work is a first step 
towards the construction of a hybrid explicit-implicit HDG method for time-domain 
electromagnetics. 


2 Problem Statement and Notations 


We consider the system of 3D time-domain Maxwell equations on a bounded 
polyhedral domain 2 C IR? 


£0; E — curlH = —J, in 2 x [0, T], 


(1) 
ид, Н + curlE = 0, in 2 x [0, T], 


where the symbol д; denotes a time derivative, J the current density, T a final time, 
E(x, t) and H(x, t) are the electric and magnetic fields. The dielectric permittivity 
= and the magnetic permeability u are varying in space, time-invariant and both 
positive functions. The boundary of 42 is defined as 952 = Ги U Га with Ги O Га = 
Ø. The boundary conditions are chosen as 


nx Е = 0, on I, x [0, T], 
nx E +n x (n x H) = n x E" +n x (n x Н?) Q) 


= gi on Г, x [0, Т]. 
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Here n denotes the unit outward normal to 32 and (E°, Н!) a given incident field. 
The first boundary condition is often referred as a metallic boundary condition and 
15 applied on a perfectly conducting surface. The second relation is an absorbing 
boundary condition and takes here the form of the Silver-Müller condition. It is 
applied on a surface corresponding to an artificial truncation of a theoretically 
unbounded propagation domain. Finally, the system is supplemented with initial 
conditions: Ео(х) = E(x, 0) and Но(х) = НС, 0). For sake of simplicity, we omit 
the volume source term J in what follows. 

We introduce now the notations and approximation spaces. We first consider a 
partition Jp, of 2 C IR? into a set of tetrahedron. Each non-empty intersection of 
two elements К+ and К^ is called an interface. We denote by FI the union of all 
interior interfaces of Jp, by FP the union of all boundary interfaces of Я, and 
Fh = F! U FB . Note that 9.7, represents all the interfaces ӘК for all K Е %. As 
a result, an interior interface shared by two elements appears twice in 0 Jp, unlike in 
Fh where this interface is evaluated once. For an interface F € F, F= K ПК, 
let v~ be the traces of v on F from the interior of K ^. On this interior face, we define 
mean values as (v) = (v* --v-)/2 and jumps as [v] = n* x v* -n^ x v^ where 
the unit outward normal vector to К is denoted by n~. For the boundary faces these 
expressions are modified as (v) = v* and [У] р = n* x v* since we assume v is 
single-valued on the boundaries. In the following, we introduce the discontinuous 
finite element spaces and some basic operations on these spaces for later use. Let 
Рьк (К) denotes the space of polynomial functions of degree at most px on the 
element К € .2,. The discontinuous finite element space is introduced as 


У, = |. є [e] such that vik є [Р, GO], VK € а) | (3) 


where L?(€2) is the space of square integrable functions on the domain 42. The 
functions in У» are continuous inside each element and discontinuous across the 
interfaces between elements. In addition, we introduce a traced finite element space 


M, = h € [zn] such that j| € [Ppp C) 
(4) 
and (2: n)|r = 0, YF e Zi]. 


3 
For two vectorial functions u and v in L5 (D)| , we denote (u, v) p = do u-vdx 


provided D is a domain in R?, and we denote < u,v >к= fs u-vds if F isa 
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two-dimensional face. Accordingly, for the mesh J, we have 


(22, = у, (к, (53a, = > (әк, 


Кє Кє 
(дж = 3165 (= DO 63r 
Fe ЕЕ. ОГ, 
We set v! = —n x (nx v), v” = n(n- у) where v and v" are the tangential and 


normal components of v such as v = У’ v^. 


3 Principles and Formulation of the HDG Method 


Following the classical DG approach, approximate solutions (Ej, H;,), for all t € 
[0, T], are seeked in the space Vp x V; satisfying for all К in 4, 


(Е. Ел, v) — (curlH;, v) = 0, Vv € Уһ, 
(5) 
(ид Нл, v) к + (си Ел, у), = 0, Vv € Vn. 


Applying Green's formula, оп both equations of (5) introduces boundary terms 
which are replaced by numerical traces Е, and Н, in order to ensure the connection 
between element-wise solutions and global consistency of the discretization. This 
leads to the global formulation for all t € [0, Т] 


(£8, En, У) к — (An, curly) к + (й,, nx a = 0, Vv € У}, 6 
(ид,Н,, Vic + (En, curly) к = (Ên, n х у = 0, Vv € У}. 


It is straightforward to verify that nx v = nx v! and < Н, пху >= — < nxH, у >. 
Therefore, using numerical traces defined in terms of the tangential components Hi, 


and Е,, we can rewrite (6) as 


(à Es, v) — (Hn, curlv) , + (Êf, n x v) =0, We Vi. » 
(ид,Н,, Y). + (En, curly) к = (£;. n x v = 0, Vv € Vp. 


The hybrid variable A; introduced in the setting of a HDG method [4] is here 
defined for all the interfaces of „Я, as 


Ay, =H, YF Ee Fp. (8) 


An Explicit НОС Method for the 3D Time-Domain Maxwell Equations 517 


We want to determine the fields Н, апа Е, in each element K of Jp by solving 
system (7) and assuming that A; is known on all the faces of an element K. We 
consider a numerical trace ЕЁ} for all K given by 


Е = Е! + zn x (Aj — Н!) on 0K, (9) 


where тк is a local stabilization parameter which is assumed to be strictly positive. 
We recall that n x Н, = n x На. The definitions of the hybrid variable (8) апа 
numerical trace (9) are exactly those adopted in the context of the formulation of 
HDG methods for the 3D time-harmonic Maxwell equations [10-12, 14]. 

Following the HDG approach, when the hybrid variable Ар is known for all the 
faces of the element K, the electromagnetic field can be determined by solving the 
local system (7) using (8) and (9). 

From now on we will note by 2'"^ the L? projection of g'"° on M}. Summing 
the contributions of (7) over all the elements and enforcing the continuity of 
the tangential component of Еһ, we can formulate a problem which is to find 
(Е, Hy, An) € Vj x Vj x My such that for all t [0, T] 


(Ед, En, v) Z7 (Ap, curly) 5, + (Ann x у), = 0, Vv є У}, 


(ид,Н,, Y) 5, + (En, curly) 7. = (£;. nx M = 0, Vv € Va, (10) 
[E,].3).. — (An. т). — (85,1) =0, Yn € Mj, 
T Г, 
Fh a Га 


where the last equation is called the conservativity condition with which we ask the 
tangential component of É; to be weakly continuous across any interface between 
two neighboring elements. 

We now reformulate the system with numerical fluxes. We can deduce from the 
third equation of (10) that 


— A А i zl 
тк+ + TK- (2 (e Bi] + (Ее), if F €, 
1 
An = 4 —nxE,-H,, if F € Fn O Tn, (11) 
TK 
1 1 . 
= (B +n x E — g"). if F € Fp OTa. 


By replacing (11) in (9) we obtain Е, = В = Е with 


ТК+ТК- 1 t t i 1 
21 —E — [Н | fF E€ Fp 
me [zu] -mi it F e 5] 
t 
Е, = 0, ifFe Fn N TIn, (12) 


1 


т (E; — nx Н, — tgn x ge if Fe Fh NTa. 
тк +1 
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Thus, the numerical traces (8) and (9) have been reformulated from the conserva- 
tivity condition. This means that the conservativity condition is now included in the 
new formulation of the numerical fluxes and can be neglected in the global system of 
equations. Hence, the local system (6) takes the form of a classical DG formulation, 
Vv € Vj 


(£8, En, v) к — (Нь, curly), + (Hi, nx v) к = 0, T 


(1.9; Ha, v) к + (En, curly), — (£;. nx y). =0. 


where the numerical fluxes are defined by (11) and (12). 


Remark 3 Let Үк = /ёк//ик be the local admittance associated to cell K and 
2к = 1/ Үк the corresponding local impedance. If we set тк = Zx in (11) and 
1/тк = Ук in (12), the obtained numerical traces coincide with those adopted in 
the classical upwind flux DGTD method [6]. 


4 Numerical Results 


In order to validate and study the numerical convergence of the proposed HDG 
method, we consider the propagation of an eigenmode in a closed cavity (2 
is the unit square) with perfectly metallic walls. The frequency of the wave is 
f = V3//2co where со is the speed of light in vacuum. The electric permittivity 
and the magnetic permeability are set to the constant vacuum values. The exact 
time-domaine solution is given in [6]. 

We start our study by assuming that the penalization parameter т is equal to 1. In 
order to insure the stability of the method, numerical CFL conditions are determined 
for each value of the interpolation order px. In our particular case we have єк and 
ик are constant = 1 VK Е Jp, so we have verified that, as we said in Remark 3, for 
т = 1, Фе values of CFL number correspond to the classical upwind flux-based DG 
method. In Table 1 we summarize the maximum Aft obtained numerically to insure 
the stability of the scheme 

Given these values of At max, the L?-norm of the error is calculated for a 
uniform tetrahedral mesh with 3072 elements which is constructed from a finite 
difference grid with n, = ny = n; = 9 points, each cell of this grid yielding 
6 tetrahedrons. The wave is propagated in the cavity during a physical time fmax 
corresponding to 8 periods (as shown in Fig. 1). Figure 2 depicts a comparison of 


Table 1 Numerically obtained values of At max 


Interpolation order Pı | IP» Р» P4 
At max (s.) 0.32 x 107? 0.19 x 107? 0.13 x 107? 0.94 х 10-10 
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Fig. 1 Time evolution of the 0.4 T 
exact and the numerical 0.2 : Mie 
solution of E, at point » : 
A(0.25, 0.25, 0.25) with a P3 M OF 
interpolation —0.2 

—0.4 

0 1 2 3 
Time(s.) -1078 


Fig. 2 Time evolution of the 
L?-norm of the error for P4 


4 
5 
5 2 
A 
0 
0 1 2 3 
-8 
Time(s.) 10 
Fig. 3 Numerical 1071 
convergence order of the time = я 
explicit НОС method for E 
pd dy dd | 
8 
E Е нос, 
"EG 3 =E HDG-P, 
5 10 Е нось; 
10712 107! 1070-8 10-96 


login) 


the time evolution of the L?-norm of the error between the solution obtained with 
an НОС method and a classical upwind flux-based DG method for рк = 4. An 
optimal convergence with order px + 1 is obtained as shown in Fig. 3. 

Now, we keep the same case than previously and we assess the behavior of the 
HDG method for various values of the penalization parameter т. We observe that 
the time evolution of the electromagnetic energy for any order of interpolation, for 
different values of the parameter т zz 1 and when the At used is fixed to the values 
defined in Table 1, the energy increases in time. In fact, It is necessary to decrease 
the At max for each value of r to assure the stability (see Table 2 and Fig. 4). For 
this example, the optimal cost will be for the parameter т = 1 (having the same cost 
as an upwind flux for a DG method) otherwise we will spend more time to finish 
our simulation. On Fig. 5, we show the time evolution of the L?-error for several 
values of r with respect to the maximal time step for the considered parameters. 
In addition, Table 3 sums up numerical results in term of maximum [2 errors and 
convergence rates. It appears that the order of convergence is not affected when the 
stabilization parameter is varied from 1 (with their associated CFL conditions). 
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Table 2 Numerically obtained values of the CFL number as a function of the stabilization 
parameter т for a P1 interpolation 


z 50 100 
Atmax(s)  |031x10-!9 3.2x10-10 17x10-?  |0.66x10-!?  |0.32x10-!? 


Fig. 4 Variation of the At 


max as a function of т |] 
x | 
g 
Е | 
а 
T 
Fig. 5 Time evolution of the .1074 
L?-error as a function of т | 01 
with a P3 interpolation T 1.5 F | 10 
Ea 11—21 
5 —— 2.0 
mer ||— 50 
о | [ | — 10.0 
0 0.5 1 
-8 
Time(s.) 10 


Table 3 Maximum L2-errors and convergence orders 


Pi, At = 0.16 x 10-09 P», At = 0.99 x 10-19 Рз, At = 0.66 x 10-19 


8299-02 |- | |987e-03  |- |934e-04 
1.90e—02 1.34e—03 5.68e—05 
4.74e—03 1.72e—04 3.46е—06 4.04 


Pi, At = 0.16 x 10-19 Ро, At = 0.96 x 107!! Рз, At = 0.66 x 107!! 


2.4e-001 — |- | ]|178e-02  |- |2.19e-03 
5.46е—02 2.85e—03 1.68е—04 3.70 
1.18е—02 4.06е—04 1.14е—05 3.88 


Pi, At = 0.16 x 10-19 Py, At = 0.96 x 10-!! P5, At = 0.68 x 10-11 


174e-0] |- . |L53e-02  |- |1.68e-03 
4.24е—02 2.23e—03 1.17e—04 | 
9.4e—03 3.10e—04 7.81e—06 3.91 
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5 Local Postprocessing 


We define here, following the ideas of the local postprocessing developed in [1], new 
approximations for electric and magnetic field and expect that both E7* and H7* 
converge with order k + 1 in the H^"! (.7;)-norm, whereas E; and H^ converge 
with order k in the H^"! (.7;)-norm. To postprocess Ер* we first compute an 
approximation (pj. he p? n) € V(K) x V(K) to the curl of E, pi(t”) = V x E(t”) 
and the curl of H, po(t”) = = V x H(t”) by locally solving the below system 


(т, )k = (Ej, V x у)к — (Ej nx vox. Vv e V(K) 
and, 
(р )k = (Нр, V x Vk — (Н, р’, пху)эк VveV(K) 
We then find (E7*, Н*) € [СК х [9 (К such that 
(Vx E, V x Wk = (ii Vx Wk, YW € [ZP (K), 
(Ej", VY)k = (Ej, УУ)к VY € Pk+2(K) 
and, 
(V x Ну, V x Wk = (р. V x М)к, VW € [Za (OT, 
(Hj*, VY)k = (Нр, УУ)к VY € Pk42(K) 
It is important to point out that we can compute E}* and Н”* at any time step without 
advancing in time. Hence, the local postprocessing can be performed whenever we 
need higher accuracy at particular time steps. Numerical results given in Table 4 


shows that a second order convergence rate is obtained for the post-processed 
solution. 


6 Conclusion 


In this paper we have presented an explicit HDG method to solve the system of 
Maxwell equations in 3D. The next step is to couple explicit and implicit HDG 
methods to treat the case of a locally refined mesh. 
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Table 4 Errors and orders of т=10 
е. ВЕСЕ 
Py, |l/h | Error Order | Error Order 
P, |1/4 |9.30е—01 |- 6.83e—01 |- 
1/6 |5.84е—01 | 1.14 |3.10е—01 | 1.95 
1/8 |4.34e—01 | 1.03 1.67е—01 | 2.15 
P, |1/4 |1.67e—-01 |- 4.28е—02 |- 
1/6 |7.46e—02 | 1.98 1.19е—02 |3.16 
1/8 |4.29е—02 |1.92 |4.90е—03 | 3.06 
Рз |1/4 |2.30е—02 |- 5.00e—03 |- 
1/6 |7.10e—03 | 2.90 1.10e—03 | 3.79 
1/8 |3.00е—03 |2.99 | 3.58e—04 | 3.84 
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Entropy Conserving and Kinetic Energy A 
Preserving Numerical Methods ш 
for the Euler Equations Using 
Summation-by-Parts Operators 


Hendrik Ranocha 


1 Introduction 


Considering the solution of hyperbolic conservation laws, high order methods 
can be very efficient, providing accurate numerical solutions with relatively low 
computational effort [21]. In order to make use of this accuracy, stability has to be 
established. Mimicking estimates obtained on the continuous level via integration- 
by-parts, summation-by-parts (SBP) operators [22, 37] can be used. In short, SBP 
operators are discrete derivative operators equipped with a compatible quadrature 
providing a discrete analogue of the L? norm. The compatibility of discrete 
integration and differentiation mimics integration-by-parts on a discrete level. 
Combined with the weak enforcement of boundary conditions via simultaneous 
approximation terms (SATs) [1], highly efficient and stable semidiscretisations can 
be obtained at least for linear problems, see e.g. [6, 14, 39] and references cited 
therein. 

In recent years, there has been an enduring and increasing interest in the basic 
ideas of SBP operators and their application in various frameworks including finite 
volume (FV) [25, 26], discontinuous Galerkin (DG) [2, 4, 10, 11, 13, 20, 27, 28, 30], 
and the recent flux reconstruction/correction procedure via reconstruction frame- 
work [15, 16, 42] as described in [31, 32]. While there is only a limited amount 
of well-posedness theory for nonlinear conservation laws, mimicking properties 
such as entropy stability semidiscretely has received much interest. Building on the 
seminal work of Tadmor [40, 41], entropy stability of second order schemes using 
symmetric numerical fluxes has been investigated, resulting in well-defined proper- 
ties that numerical fluxes have to satisfy in order to result in entropy conservative 
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schemes. Decomposing general semidiscretisations into a non-dissipative central 
part and an additional dissipative part, suitable artificial dissipation or filtering can 
be added afterwards, cf. [7, 9, 38]. Second order methods based on symmetric 
numerical fluxes can be extended to high order in a conservative way, cf. [4, 7, 28] 
and [8, 23, 34—36]. 

Another property of numerical methods for the Euler equations that has received 
much interest in the literature concerns the kinetic energy. A structural property 
of numerical fluxes described by Jameson [18] has been used to construct so- 
called kinetic energy preserving (KEP) numerical fluxes inter alia by Chandrashekar 
[3]. However, schemes using these fluxes do not preserve the kinetic energy as 
expected in numerical experiments by Gassner et al. [12]. They had to change the 
discretisation of the pressure to reduce undesired changes of the kinetic energy. 
However, this resulted in a loss of entropy conservation. Motivated by these results, 
some analytical insights into this behaviour have been developed in [29, Section 7.4] 
and will be presented here. 

This chapter is structured as follows. At first, some basic results about SBP 
operators and corresponding semidiscretisations of hyperbolic conservation 
laws are reviewed in Sect. 2. Afterwards, the Euler equations are considered in 
Sect.3. After demonstrating that the property that has been used to characterise 
numerical fluxes as KEP is not well-defined, the new concept of KEP numerical 
methods is introduced. Moreover, a numerical flux that is both entropy 
conservative and kinetic energy preserving in the new sense is developed. 
Thereafter, results of а numerical experiment comparing entropy conservative 
numerical fluxes are described in Sect.4. Finally, a brief summary is given in 
Sect. 5. 


2 Discretisations Using Summation-by-Parts Operators 


Consider the Euler equations in two space dimensions 


p D Ux Pvy 
2 
а |09 аар OTP ра P5». | 0, (1) 
Pvy PUx Vy : pv, + p 
pe (pe + p)vx (pe + p)vy 
—— ——— —— 
=и = үх (и) =f (и) 


where р is the density, v the velocity, e the specific total energy, and p the pressure. 
For a perfect gas, p — (y — D) (pe — 1pv?). The usual entropy is U — am where 
s = log p — y log p is the specific (physical) entropy. 

With the entropy fluxes F j fulfilling 9,0 - д, J = д, Fİ, smooth solutions of 
the Euler equations in d space dimensions satisfy д, U (u) + үзүн 9;F J(u) = 0 and 
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the entropy inequality 


d 
QU) + УЕ) (и) x0 Q) 
j=l 


is used as additional admissibility criterion for weak solutions, cf. [5]. 

In order to discretise (1), the domain 52 is divided into several non-overlapping 
sub-domains (2; C 42 and SBP operators will be used on each element. SBP 
operators consist of discrete derivative operators D;, approximating the partial 
derivative in direction j, and a symmetric and positive definite mass/norm matrix М, 
approximating the L?(€2) scalar product via ul Mv = (u,v, )м = (u,v, ILZ) = 
f Q, V Moreover, an interpolation operator А approximates the restriction of 
functions on 42; to the boundary 94521 and a symmetric and positive definite 
boundary mass matrix B approximate the [2(942,) scalar product. Representing 
the multiplication by the j-th component of the outer unit normal v at д2; by the 
diagonal matrix n j, the SBP property 


мр; + DIM = Е! BnjR (3) 
has to be satisfied in order to mimic integration-by-parts discretely via 


u! MDjv -u! DI Mv = и! RT ВМ, Ко, 
——M— — —a— 


Q 
| 2 (4) 


| и (0jv) «f (9ju) v = [ uvnj. 
0 0] IQ 


Semidiscretisation of (1) will be constructed as follows. Each sub-domain £2; © 
§2 is mapped onto a reference element and all computations are performed there. 
On each element, the resulting semidiscretisation is of the form 


ди + VOL + SURF = 0, (5) 


where the volume terms VOL discretise the flux divergence in the interior of 421 
and the surface terms SURF couple elements or impose boundary conditions. 
Here, и is the vector of the nodal values of the numerical solution at specified 
nodes &; in £2; and a collocation approach is used. Thus, nonlinear operations are 
performed pointwise and the discrete fluxes f/ are given by their nodal values 
f? = fii) = f/(u(&)). As in (nodal) discontinuous Galerkin methods, the 
surface terms will be built using numerical fluxes f™™/ in the j-th coordinate 
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direction as 
d H H 
SURF = 3 M^! КТ Bn; (fJ — Rf’). (6) 


Finally, the volume terms are constructed using symmetric (two-point) numerical 
fluxes f "9. (volume fluxes) that are consistent with f/ as 


d 
VOL; = у > 20D puc f£? (ui, ш), (7) 


j=l k 


where VOL; is the volume term at & [7]. If f volj are smooth fluxes, the 
discretisation (7) is of the same order of accuracy as the derivative matrices D; 
[4, 28]. Moreover, if the mass matrix M is diagonal, this approximation can be 
written in a conservative form [7]. Finally, if the boundary operators А7 Bn jR are 
also diagonal and fY°bJ are entropy conservative in the sense of Tadmor [40, 41], 
the semidiscretisation (5) is entropy conservative/stable across elements if the 
numerical surface fluxes f"""*/ are entropy conservative/stable. Moreover, some 
results on the kinetic energy can be transferred as well [12]. In the following, the 
focus will lie on the fluxes f Y-J. 


3 Euler Equations and Kinetic Energy 


The kinetic energy Exin = 1 pv? fulfils (for sufficiently smooth solutions) 


1 
0; Exin + (роо) +v- grad p = 0. (8) 


Jameson [18] investigated the kinetic energy in a one-dimensional semidiscrete 
setting using finite volume methods. To simplify the notation, this setup will be used 
in the following; its extension to multiple dimensions is straightforward. Jameson 
proposed to mimic (8) semidiscretely by using numerical momentum fluxes of the 
form fj^ = fov (u—, u+) = {о} fj + p", where {{v} is the arithmetic mean 
of v_ and v+, fo ™ is the numerical density flux, and p™™ is a consistent numerical 
approximation of the pressure. Later, this has been used as a kind of “definition” of 
kinetic energy preserving (KEP) numerical fluxes, e.g. in [3, 12]. However, this is 
not a well-defined concept, cf. [28, 29]. Indeed, every numerical momentum flux 
can be written as 


Л» equ a — И Р (9) 
——— 


=: pum? 
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Since the numerical fluxes are consistent, р" := m — {fu} foe is a 


consistent approximation of the pressure. The insufficiency of the condition f F^ = 
ор" + p™™ is in accordance with observations of Gassner et al. [12]. They 


p 
investigated a Taylor-Green vortex problem and compared several numerical fluxes 
for the Euler equations. There, numerical fluxes of the form Доу" = (0) fp" + 


p™™ with p™™ + (f py resulted in a clear loss of kinetic energy compared to other 
KEP fluxes using the arithmetic average р" = {р} as approximation of the 
pressure. They observed that “the discretisation of the pressure plays a crucial role 
for the kinetic energy" and that the choice of the arithmetic average рп" = {р} 
"seems to be important for the kinetic energy equation" [12, Section 4.2]. However, 
they had no (theoretical) explanations for this observation. 


3.1 New Approach to Kinetic Energy Preservation 


By а heuristic argument, the balance law (8) may not be suitable in the incompress- 
ible limit: Indeed, for smooth solutions, (8) can be rewritten as 


1 
0; Ekin + div( pov + pv) — pdivv = 0, (10) 


which becomes a conservation law for smooth solutions of the incompressible 
Euler equations due to div(v) = 0 or an energy inequality similar to the entropy 
inequality (2). Since the kinetic energy is plays a crucial role in the incompressible 
limit [24], the second form (10) might be considered the "better" one. Thus, a 
semidiscretisation mimicking this equation might be desirable near the incompress- 
ible limit. 

Definition 1 A numerical flux f™™ = (fum, fnm. Spe) for the Euler equations 
is called kinetic energy preserving (KEP), if the momentum flux can be written as 
fmm = (vj fem + {р}. 

Definition 1 results in a well-defined concept of KEP numerical fluxes. 


Theorem 1 (Corollary 7.5 of [29]) [Га kinetic energy preserving numerical flux is 
used in a semidiscrete FV method, the resulting semidiscrete kinetic energy equation 
mimics both the conservative and the non-conservative terms of Eq. (10). 


Proof (Sketch) Using the chain rule in a one dimensional finite volume setting, the 
time derivative of the kinetic energy in cell i becomes 


1 2 1 1 2 num 1 2 num 
à (5р | a (zov v pv) (ui, ши) — (zov v+ pv) (ил, Ui) 


"n Whe ны 


, 


where (4ov?v + pv) Qui, uj) = vivj fj (ui uj) + DM. п 
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Using the momentum flux ру" = {v} fj" + {р} in the volume terms (7) in 
one dimension, the arithmetic average of the pressure yields the volume term Dp, 
i.e. a straightforward discretisation of д, p. Analogous results hold in multiple space 
dimensions, cf. Sect. 2. 

The kinetic energy preserving DG methods presented in [11, 27] use volume 
terms corresponding to the numerical fluxes Р" = {ру}, fov pum = Кро} {и} + 
{р}, which are kinetic energy preserving in the sense of Definition 1; 


3.2 Entropy Conservative and KEP Numerical Fluxes 


Since entropy stability has received much interest and the entropy conservative 
numerical fluxes of [3, 17] are not KEP in the sense of Definition 1, it is interesting 
whether both concepts can be fulfilled simultaneously. The logarithmic mean 
value {р}ю = 101/008 01 has been proposed by Roe [33] in the context 
of entropy conservative numerical fluxes and is described in [17]. Many useful 
entropy conservative numerical density fluxes are of the form f Fd = {о0о (0), 
e.g. the one presented in [3]. This form seems to be preferable, since positivity 
preservation of the density can be achieved using local Lax-Friedrichs/Rusanov 
dissipation operators [28, Section 6.2]. Using this ansatz for i and Definition 1, 
the following entropy conservative and kinetic energy preserving numerical flux 
( f?" analogously) has been constructed in [29, Section 7.4] 


f= орь о JEn = fue + qp), FE = ty, 


(11) 
но 1 {ойно 
пит, x 2 2 x y 
fos tensa +} Е E у—1 ры V”? (v3 
_ РИ 
— 


4 Numerical Results 


Since the kinetic energy is an important quantity for the incompressible Euler 
equations, a Taylor-Green vortex given by 


p(t, x, y) = 1, их (f, x, у) = sin(x) cos(y), 
. 100 | cos(2x) + cos(2 
vy(t, x, y) = — cos(x) sin(y), p(t,x,y) = y — + — 


(12) 
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for (x, y) Е [0, 2n] with periodic boundary conditions is considered, which is 
a stationary solution of the incompressible Euler equations. Using tensor product 
Lobatto bases for polynomials of degree р = 5 оп N = 16 elements per 
coordinate direction, the numerical solutions have been computed in the time 
interval t € [0,30] with the fourth order, ten-stage, strong stability preserving 
Runge-Kutta method of [19]. The time step At has been chosen as Af = 
cfl min [Ах /Ор + DA}, where A is the greatest absolute value of the eigenvalues 
of f' and the minimum is taken over all cells and nodes. As in [12], the given 
numerical fluxes have been used for both the volume terms (7) and as surface fluxes 
in (6), without additional dissipation. 

The evolution of the entropy U and the kinetic energy Е using a CFL number 
cfl — 0.9 for the entropy conservative fluxes of Ismail and Roe [17], Chandrashekar 
[3], and the new flux (11) are visualised in Fig. 1. As can be seen there, the entropy 
remains approximately constant and the kinetic energy oscillates uniformly until 
t ^ 20. Afterwards, the kinetic energy drops for the fluxes of [3, 17] and there is 
a relative change of the entropy of order 1075. Contrary, there is no visible change 
for the new flux (11). 

The entropy loss for the fluxes of Ismail and Roe [17] and Chandrashekar [3] is 
caused by the time integration scheme, as can be seen in Fig. 2, where the time step 
is reduced by an order of magnitude (cfl — 0.09). However, the behaviour of the 
kinetic energy is nearly unchanged. 


= 

we = 

Eu 

ш 
—4 . 

el. 

25 —1 Е 

EE —— Numerical Flux of Chandrashekar 

> - - - Numerical Flux of Ismail & Roe 
=e New Numerical Flux (KEP & EC) 

0 5 10 15 20 25 30 
Time t 


Fig. 1 Total entropy and kinetic energy of numerical solutions using different entropy conserva- 
tive numerical fluxes with cfl = 0.9 
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Fig. 2 Total entropy and kinetic energy of numerical solutions using different entropy conserva- 
tive numerical fluxes with cfl = 0.09 


5 Summary and Discussion 


Using summation-by-parts operators, high order numerical schemes with specific 
properties can be constructed using symmetric (two-point) numerical fluxes. While 
several "kinetic energy preserving" methods have been proposed, they have been 
characterised by a property of the numerical fluxes that is not well-defined. Such 
numerical fluxes resulted in schemes that did not preserve the kinetic energy as 
expected [12]. Here, a new approach to kinetic energy preservation inspired by 
the incompressible Euler equations and developed in [29, Section 7.4] has been 
described. This results in a well-defined property numerical fluxes have to satisfy 
in order mimic the balance law for the kinetic energy more reliably. Moreover, new 
entropy conservative numerical fluxes have been developed that are kinetic energy 
preserving in the new sense. 
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Multiwavelet Troubled-Cell Indication: А (9) 
Comparison of Utilizing Theory Versus ER 
Outlier Detection 


Mathea J. Vuik 


1 Introduction 


Solutions to nonlinear hyperbolic PDEs develop discontinuities in time. The 
generation of spurious oscillations in such regions can be prevented by applying 
a limiter in the troubled zones. In [16, 18], two different multiwavelet troubled- 
cell indicators were introduced, one based on a parameter, the other using outlier 
detection. We present this comparison in order to begin to understand in which 
regime these tools are effective. In this paper, we investigate the effectiveness of 
a different detection scheme, based on the theoretical detection of troubled cells 
using multiwavelet approaches. It uses the cancelation property [6] and the theory 
about thresholding [8]. This technique was originally used for a multiwavelet- 
based adaptive strategy in combination with the DG method. However, we are 
specifically interested in its application for troubled-cell indication. In the troubled 
cells, the moment limiter is applied [11]. We demonstrate the performance of this 
new indicator and show that it works very well when very fine meshes are used 
(the asymptotic regime). For coarser meshes, it seems that the existing multiwavelet 
troubled-cell indicators perform better. 

The outline of this paper is as follows: in Sect. 2, some background information 
about the multiwavelet theory is given. The existing multiwavelet troubled-cell indi- 
cators, as well as the cancelation property and the derived thresholding technique are 
described in Sect. 3. Numerical results are shown in Sect. 4, and some concluding 
remarks are given in Sect. 5. 
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2 Multiwavelets and DG 


In this section, we consider the multiwavelet theory that is used to design the 
different troubled-cell indicators. For the sake of brevity, we neglect discussion of 
the DG scheme [4, 5], that is used in the computations. 

The relation between the DG scheme and multiwavelets was shown in [16]. Any 
global one-dimensional DG approximation of degree К can be written as 


2"-1 k 


ua) = 278 У у иф), 


1=0 6=0 


where фе are the scaling functions related to the orthonormal Legendre polynomi- 
als. The Corresponding multiwavelet decomposition is 


n—-12"—] К 
ив (x) = S istis 3035 3:/7407 
#=0 m=0 j=0 ё=0 


where 59 are the scaling-function coefficients belonging to ил, and dy; are the 


corresponding multiwavelet coefficients, [2, 16]. The multiwavelets we have been 
developed by Alpert [1]. 


3 Utilizing Multiwavelet Coefficients for Troubled-Cell 
Indication 


In this section, we show different troubled-cell indicators that utilize multiwavelet 
coefficients. Note that, as the detectors are solely based on the underlying approx- 
imation space, the ideas do not need to be modified in order to be applied to 
other types of model problems than those included in this paper. First, the existing 
indicators that use either a parameter or the boxplot method are presented. Next, 
the cancelation property and thresholding technique are used to design a different 
indication technique. 


3.1 Boxplots for Outlier Detection 


In [16, 17], we have shown that the coefficients d; 3m are very useful for troubled- 
cell indication. With this knowledge, we have designed two different troubled- 
cell indicators. The first indicator is the so-called parameter-based multiwavelet 
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troubled-cell indicator [16]. Here, we detect an element as troubled when 
ld; !| > С. max(|dz; ||, j =0,...,2” — 1], C € [0, 1]. (1) 


The value of C is a useful tool to prescribe the strictness of the limiter. 

Another option is to use outlier detection on the multiwavelet coefficients dẹ = 
to detect the troubled cells [18]. Here, Tukey’s boxplot method [14] is applied 
locally to prevent the need for a problem-dependent parameter. The different steps 
are presented in Algorithm 1. 


Algorithm 1 Outlier-detection algorithm using local vectors 


Send in a suitable troubled-cell indication vector D. 
Split this vector into local vectors, d. 
for all local vectors do 
Sort d to obtain d^. 
Compute the quartiles О and Оз. 
Detect аў in the smallest 25% of d* if d < Qı —3(Q3 — Q1), and d in the biggest 25% of 


d' ifd; > Q3 + 3(Q3 — Q1). 
end for | 
Ignore the detected outliers in the left half of the local region when they are not detected with 
respect to the left-neighboring vector, and similarly test the detected coefficients in the right half 
of the local region. 


Outliers are the coefficients in the vector that are straying far out beyond the 
others. In order to pick out certain coefficients as outliers, the outer fences are 
constructed, which were originally defined by Tukey [14]. The outer fences of a 
vector are [Q1 — 3(Оз — Q1), Оз + 3(Q3 — Q1)] (coefficients outside are called 
extreme outliers). The coverage for this whisker length is 99.999896, such that only 
0.000246 of the data in a normally distributed vector is detected as an extreme outlier 
(asymptotically) [9]. 

In our computations, we always use local vectors of length 16. 


3.2 Cancelation Property 


In this section, the cancelation property is stated and proved for the one-dimensional 
case [6]. Here, we assume that the multiwavelets have М + 1 vanishing moments. 
In our case, we have М = £ + k 11, 15]. If the solution satisfies the continuity 
requirement и| те org 7) (where 7 A is the j-th element in level m), then 


1 
me Wy MTD m . gmt 1) (M+3/2) 
ар; < Or Di [и ооо) - 2 А (2) 


m —0,...,n, j = 0,...,2" – 1,2 = 0,..., К. 
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The proof uses a Taylor expansion of u about element center хӯ: there exists a & 
between x and хў such that 


(M) (ym) (M+1) 
aw ep ae 


u(x) = wr") Би GP) х) +... Я (М +)! x *j 


Using that the first М + 1 moments of the multiwavelets vanish, we find 


d" — m u FD (Е) m\M+1 m 
£j — (и, Vejhe” = ии! Хх; ) , Vr 
I" 
J 
ини» (M1) > m = m M+1 m m 
< (M+ 0111 lrogm (x р), Че) т. (3) 


Next, we use Cauchy-Schwarz's inequality to find 
(Ge xf, урун < Пос х) gm lyla = Mor —х зону, 


because the multiwavelets are orthonormal. Using the notation Ах” for the element 
size in level т, we have 


I(x = х) Иа») < (Ax y MI зону = (Ax MET S Axm = (AxPM3/2 


For the domain [—1, 1], we have Ax” = 27™+!, This means that 


Пос Р) рану ж 2032), 
4 J 


which proves the cancelation property. It should be noticed that this result can be 
generalized to general grid hierarchies and higher-dimensional problems [6, 10]. 

The next section contains a discussion of the thresholding technique for one- 
dimensional multiwavelet expansions. 


3.3 Thresholding of the Multiwavelet Coefficients 


In this section, the thresholding technique for systems of conservation laws in 
one dimension is explained, which is based on the cancelation property [8]. 
This technique is originally used for a multiwavelet-based adaptive strategy in 
combination with the DG method. However, we are specifically interested in its 
application for troubled-cell indication. 
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Following [8], the element 7 P^ 15 detected as troubled if 


ld?  (r)| 
max ———— > £g 1v 24x. 


sn 


Here, the value r is related to the conserved quantity in a system of three PDEs. 
The factor /2Ax (with Ax the DG mesh width) occurs because of a scaling 
difference: the multiwavelets in [8] are scaled with respect to the L°°-norm, whereas 
ап L?-norm scaling is used in this paper. The level-dependent threshold value £n—1 
is chosen as є„—1 = 8/2. The parameter € can be chosen using two different 
strategies [8]. The first option is to use the a priori strategy, which is based on the 
balance between discretization errors and perturbation errors of adaptive meshes 
[10]. If the solution contains discontinuities, then the a priori strategy leads to 
= = CAx?. The second option is the heuristic approach, which is based on 
numerous computations for practical applications [8]. This method is more efficient 
since it is less pessimistic than the a priori strategy. For discontinuous solutions, the 
heuristic approach uses £ = C Ax. 
This yields detection of element p if 


ej 
Id; (г) 1 
max | — — = ~ w Axb+95¢, 
о тах [maxizo... 2i 20-072 |s0 (Г), i} 2 


where В = 2 for the a priori strategy and В = 1 for the heuristic strategy. Note that 
the multiwavelet coefficients are scaled by the cell average if this value is greater 
than 1 in absolute value (to prevent division by zero). 

The optimal choice of the parameter C depends on the problem, in particular on 
the strength of the shock compared to the normal amplitude of the solution. The 
smaller C is, the more elements are detected. In general, the value С = 1/(b — а) 
should work for the domain [a, b] [8]. If C is chosen too small, then too many cells 
are detected as troubled. For the adaptive strategy, this is not really problematic since 
the approximation is usually more accurate on a finer grid. However, for troubled- 
cell indication, it is important to detect the correct number of elements. 

It should be noticed that this indicator is designed for very fine resolutions (since 
the strategies use asymptotic arguments). For coarse meshes, smaller values of C 
should be used, which are difficult to predict a priori. 
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3.4 Generalized Grids 


The algorithm for utilizing Alpert's multiwavelets for a nonuniform grid is given in 
[7]: the only difference with Alpert's algorithm [1] is that no additional vanishing 
moments are added. Multiwavelets for one-dimensional irregular meshes have been 
designed in [12, 13]. It should be noticed that this construction is local, which 
means that the resulting bases are depending on the level and the position unless 
there is an affine mapping from the element to a reference element. This leads to 
slower computations. On the other hand, the use of such multiwavelet space makes 
it possible to decompose the DG approximation to a multiwavelet expansion exactly. 
The multiwavelet coefficients will again become small if the underlying function is 
smooth, and the mesh width between two neighboring elements is not varying too 
much. 

When coupled with a troubled-cell indication variable, it will be necessary to 
include spatial information of the mesh in the algorithm using the element size. 
Alternatively, one can use of a window-based technique [3]. A window is a fixed 
length subsequence of the test sequence, which can be slid through the domain using 
a sliding step. These issues and resulting numerics are discussed further in [15]. 


4 Numerical Results 


In this section, the different multiwavelet troubled-cell indicators are applied to one- 
dimensional problems based on the Euler equations of gas dynamics. 

The results for the original multiwavelet troubled-cell indicators (both based on a 
parameter, and based on outlier detection), can be seen in Figs. 1 and 2 (polynomial 
degree 2, 128 elements for Sod's and Lax's shock tube, and 512 elements for the 
blast-wave and Shu-Osher problem). The parameter-based technique performs well 
if a suitable value for the problem-dependent parameter C is chosen. The outlier- 
detection results are generally better than the original troubled-cell indicator using 
an optimized parameter: both the weak and the strong shock regions were detected, 
whereas smooth regions were not selected. 

It is also possible to use the thresholding technique for multiwavelet coefficients 
to detect troubled cells. It turns out that this indicator works very well as long 
as an appropriate value for C is chosen, and the mesh is taken fine enough. The 
results for the different test cases are visualized in Fig.3 using the heuristic strategy 
(polynomial degree 2, 1024 elements for all models). Here, we take the value 
C = 1/(b — a) where [a, b] is the domain on which the test problem is defined. 
Note that this thresholding technique is very accurate. However, many elements 
should be used to meet the asymptotic properties of the indicator. 
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Fig. 1 Time-history plot of detected troubled cells using the parameter-based multiwavelet 
troubled-cell indicator, polynomial degree 2. (a) Sod's shock tube, C = 0.1, 128 elements. (b) 
Lax's shock tube, C — 0.1, 128 elements. (c) Blast-wave problem, C — 0.05, 512 elements. (d) 
Shu-Osher, C — 0.01, 512 elements 


If the number of elements is taken smaller, then C should decrease to detect the 
correct features. In that case, it is difficult to guess the correct value of C. Another 
option is to use the a priori strategy for coarser meshes, see Fig. 4 (polynomial 
degree 2, 128 elements for Sod's and Lax's shock tube, and 512 elements for the 
blast-wave and Shu-Osher problem). If C — 1/(b — a) is used, then this approach 
works well for Sod's and Lax's shock tube, but too many elements are detected for 
the blast-wave and the Shu-Osher problem. Also here, the value of C should be 
adapted to find the correct results. 
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Fig. 2 Time-history plot of detected troubled cells using the outlier-detection multiwavelet 
troubled-cell indicator, polynomial degree 2. (a) Sod's shock tube, 128 elements. (b) Lax's shock 
tube, 128 elements. (c) Blast-wave problem, 512 elements. (d) Shu-Osher problem, 512 elements 


5 Conclusions and Recommendations 


In this paper, a new troubled-cell indicator was formed, based on the cancelation 
property for multiwavelets and the derived thresholding technique. Inspection of 
this technique reveals that it is very useful to design adaptive meshes [8]. For 
troubled-cell indication, we found out that detection is very accurate as long as a 
very fine mesh is used. For coarser meshes, it seems to be more useful to apply a 
different detection method. Furthermore, it is not straightforward how to choose the 
parameter C. 

More research should be done to see in which way the cancelation property for 
multiwavelet coefficients can be used for the accurate detection of troubled cells. 
For example, it could be that this property also relates to the severity of the shocks. 
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Fig. 3 Thresholding technique with heuristic approach: time-history plot of detected troubled 
cells, 1024 elements, polynomial degree 2, C — 1/(b — a), with [a, b] the computational domain. 
(a) Sod's shock tube. (b) Lax's shock tube. (c) Blast-wave problem. (d) Shu-Osher problem 
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Fig. 4 Thresholding technique with a priori approach on coarser meshes: time-history plot of 
detected troubled cells, polynomial degree 2, C — 1/(b— a), with [a, b] the computational domain. 
(a) Sod's shock tube, 128 elements. (b) Lax's shock tube, 128 elements. (c) Blast-wave problem, 
512 elements. (d) Shu-Osher problem, 512 elements 
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1 Introduction 


In recent decades, high-order discontinuous Galerkin (DG) methods have been 
gaining increasing popularity for high-accuracy solutions of systems of conservation 
laws, such as the compressible Euler and Navier-Stokes equations [5, 6, 22]. The 
lack of a continuity constraint on element interfaces makes DG methods robust for 
describing advection-dominated problems when an appropriate Riemann solver is 
selected [5, 12, 22]. 

Multigrid methods speed up the iterative solution of large systems of equations 
using coarse-grid representations (lower levels). Iterative methods (known as 
smoothers in the multigrid community) are good at eliminating the high frequencies 
of the error fast; therefore, when applied to coarse-grid representations, they also 
reduce the low frequencies of the error. They have been broadly used in the high- 
order community in recent years in the form of p-multigrid [2, 8] (where levels are 
constructed using different polynomial orders) and hp-multigrid [14, 21] (where 
both the order and size of the elements are changed). Two types of multigrid 
methods can be found in the literature: linear and nonlinear multigrid. In our 
work, we make use of the nonlinear multigrid scheme, also known as the Full 
Approximation Scheme (FAS), since it enables the estimation of the truncation 
error of coarse representations, as will be shown. The smoother can be either a 
time-marching scheme (implicit or explicit), or an iterative method applied to the 
linearized problem. 
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Because of the allowed discontinuities on element interfaces, DG methods are 
capable of handling non-conforming meshes with hanging nodes and/or different 
polynomial orders efficiently [7, 13, 15]. It is possible to take advantage of this 
feature to accelerate the computations through local adaptation strategies. Local 
adaptation can be performed by subdividing or merging elements (h-adaptation) or 
by enriching or reducing the polynomial order in certain elements (p-adaptation). 
The main idea behind these methodologies is to reduce the number of degrees 
of freedom (NDOF) while maintaining a high accuracy, which translates into 
shorter computational times and reduced storage requirements. Furthermore, since 
several 2D and 3D implementations of the DG methods use tensor-product basis 
functions, it is possible to adapt the polynomial order in each coordinate direction 
independently. In order to identify the localized regions that need increased or 
decreased accuracy, an error estimator is commonly used. 

There are several approaches to estimate the error and drive an adaptation 
method. In this work, we focus on truncation error estimates since it has been shown 
that a reduction of the truncation error controls the numerical accuracy of all func- 
tionals [10], hence reducing the truncation error necessarily leads to a more accurate 
lift and drag. The t-estimation method [4] is a way to estimate the truncation error 
locally that has been used to drive mesh adaptation strategies in low-order [9, 20] 
and high-order methods [10, 17, 18]. The adaptation strategy consists in converging 
a high order representation (reference mesh) to a specified global residual and then 
performing a single error estimation followed by a corresponding mesh adaptation 
process. Rueda-Ramírez et al. [19] developed a new method for estimating the 
truncation error of anisotropic representations that is cheaper to evaluate than 
previous implementations, and showed that it produces very accurate extrapolations 
of the truncation error, which enables the use of coarser reference meshes. 

In this work, we employ the anisotropic truncation error estimator developed 
in [19] and the anisotropic p-adaptation method detailed in [18] to accelerate the 
computation of the compressible steady viscous flow past a NACAO0012 at angle of 
attack 5°, Rego = 200 based on the airfoil chord, and Me; = 0.2. This particular 
settings correspond to a steady laminar flow, but the proposed method can be directly 
used with any steady solution (e.g. RANS). The paper is organized as follows: In 
Sect. 2, we briefly describe the methods used in this paper. In Sect. 3, we compare 
the performance of the proposed methods with traditional strategies for solving the 
flow past a NACAO0012 and show the speed-up advantages for different accuracies. 
Finally, the conclusions are summarized in Sect. 4. 


2 Methods 


2.1 DG Method 


We consider the approximation of systems of conservation laws, 


9+ У -.2= 8, (1) 


Ап Anisotropic p-Adaptation Multigrid Scheme for the DGSEM 551 


where q is the vector of conserved variables, „Я is the flux dyadic tensor, and s is 
a source term. The domain 42 is partitioned in a mesh 7 = {е} consisting of К 
non-overlapping elements 42°. Multiplying equation (1) by a test function v and 
integrating by parts over each subdomain 42° yields the weak formulation: 


[ aava- | F ууа“ + | Fnac" = | svd2*. Q) 
Re e ane 


e 


Let q, s, .Z and v be approximated by piece-wise polynomial functions defined in 
the space of L? functions: YN = {УМ e LURS) : УМ ge e PNRA YR e 7], 
where P" (2°) is the space of polynomials of degree at most N. The functions in 
Y^ can be represented in each element as a linear combination of basis functions 
on Е PN (0%) (e.g. qu ioe = №: Q" 6), where on are usually tensor product 
expansions. After some manipulations, the discontinuous Galerkin finite element 
discretization system is obtained: 


[M]à;Q" + F(Q") = [MIS", (3) 


where [M] is the mass matrix and Е is a nonlinear operator, which are the assembled 
global versions of the element-wise mass matrices and nonlinear operators: 


(MI; = |. фФфудо, (4) 


NDOF* 


F(Q;- У |- f, E vean + / Q7" (Q. Q7. n) gjdo", 
i=l 
(5) 


where F? is the ith position of the vector 2°, which contains the value of Z° for all 
the degrees of freedom of element e. In the rest of this paper, bold uppercase Roman 
letters and bold Greek letters are used to note vectors spanning several degrees of 
freedom, unless specified. 

The numerical flux function .Z* allows to uniquely define the flux at the element 
interfaces and to weakly prescribe the boundary data as a function of the conserved 
variable on both sides of the boundary/interface and the normal vector. In the present 
work, we use the scheme by Roe [16] as the advective Riemann solver and the 
original scheme by Bassi and Rebay [1] (BR1) as the diffusive Riemann solver. 


2.2 Full Approximation Scheme p-Multigrid 


The Full Approximation Scheme (FAS) is a nonlinear version of the multigrid 
method that is specially suited to solve systems of nonlinear equations [4]. Depart- 
ing from Eq. (3) and defining the operator A (Q) = [M] 'Е(О^), the steady-state 
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problem of order P yields 
AQ’) =5°. (6) 


After В1 sweeps of a smoother, a non-converged solution О? is obtained that has 
an associated discretization error €” = Q^ — Q^. The FAS multigrid procedure 
consists in obtaining an approximation to the discretization error in a coarse grid of 
order N and projecting it to the original problem of order P: 


ЕР = Ye" = QY — 120^), (7) 


where 15 is ап L? projection operator N — Р and Q” is the solution to the coarse- 
grid problem: 


АМО) = S", (8) 

where the source term is defined as 
SY = АХО”) +1} (s? - Aa” GP). (9) 
In practice, several p-multigrid levels are used in V- or W-cycles. The smoothing 
steps that are performed when coarsening are called pre-smoothing sweeps, and the 
ones performed when refining back are called post-smoothing sweeps. Furthermore, 
Q is not obtained exactly in the coarse grids, but approximated using an iterative 


method ОМ — ОЎ. In this work, we use a third order low-storage Runge-Kutta 
(RK3) as the smoother and V-cycles. 


2.3 t-Based p-Adaptation 


In this section we show how to drive an anisotropic p-adaptation procedure using 
the truncation error, which is estimated in the multigrid procedure. 
2.3.1 The Anisotropic r -Estimation Method 
The non-isolated truncation error of a discretization of order N is defined as 
tN = RNIN) — RQ), (10) 


where q is the exact solution to the problem, I" is а discretizing operator, R 
is the continuous partial differentiation operator, and RY is the discrete partial 
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differentiation operator. From Eqs. (1) and (3): 
RAQ =s—V-F, (11) 
^^) = [MIS — FU" q), (12) 


where I" is an operator that samples the exact solution on the points that correspond 
to the degrees of freedom of a representation of order N, and therefore Eq. (12) 
corresponds to the sampled values of RY (IV q). 

Note that in steady cases, R(q) = 0 holds. Since the exact solution q is usually 
not at hand, we utilize the quasi a-piori t-estimation method, which approximates 
the exact solution with the non-converged solution on a high-order grid а = q^, 
where № < P. Therefore, the steady non-isolated truncation error estimation yields 


тр = Юра”) > тр =R” AYO’) = [MIS" - ЕарО”). (13) 


On the left side of the arrow is the estimation of the truncation error that lives 
in the space Y^. and on the right side is the sampled form of the truncation 
error estimation on the points that correspond to the degrees of freedom. In a DG 
representation, one can also define the isolated truncation error ? as 


# = f" ХОР) = IMIS" — Fa Q^), (14) 


where F is the assembled version of the isolated nonlinear operator, defined 
elementwise as 


NDOF* 


FQ ;= У [-/ я-то" | + | FN . nójda*. (15) 
i=l 2° 90% 


Note that Eq. (15) is (5) without substituting F by the numerical flux .Z*. This 
change eliminates the influence of the neighboring elements and boundaries on the 
truncation error of each element. We drop the hat notation in the next statements 
since they are valid for both the isolated and non-isolated truncation error. 

The t-estimation method can also be used with anisotropic representations, i.e. 


NUN NUN» N1 № PP. 
inn = RM 2016124 12), (16) 


where N; and P; are the polynomial orders in the direction i of the analyzed 
representation and the high-order reference solution, respectively, where N; < P;. 
Additionally, Rueda-Ramírez et al. [19] showed that the truncation error of an 
anisotropic representation can be estimated using directional components: 

№№ 


N. NP: 
QUEE м] 152 py p42 


N PN. 
+ Ty P; P + tee (17) 
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where the directional components in discrete form are therefore, 
МР Ni P A 
Ti = Tp p = [MIS" – [MJA I Q"), (18) 


and that these directional components decrease exponentially with the polynomial 
order in smooth solutions. Consequently, it is possible to use a semi-converged 
solution 9! 2 to estimate t™!™2 (N; < P;) and then extrapolate the directional 
components т; to obtain the values of c"? for N; > P;. Figure la shows а 
graphical representation of the truncation error t™!^2 as estimated with a semi- 
converged solution of order Ру = P» = 5. 


2.3.2 The p-Adaptation Multigrid Scheme 


It has been shown that the use of FAS p-multigrid methods speeds up the 
computation of steady-state and unsteady solutions of the compressible Navier- 
Stokes equations [2, 8]. In addition, Rueda-Ramírez et al. [18] showed that the 
truncation error of an anisotropic representation can be inexpensively obtained 
inside an anisotropic p-multigrid cycle that performs the coarsening in one coor- 
dinate direction at a time. In fact, the second term of Eq.(18) is naturally 
computed in an anisotropic multigrid for obtaining the coarse-grid source term 
(Eq. (9)). 

Therefore, we propose a p-adaptation multigrid scheme that makes use 
of the multigrid as a solver, but also as an error estimator. Every time the 
error is estimated, an anisotropic p-multigrid strategy is used to generate a 
truncation error map for each element, like the one in Fig.la. Afterwards, 
the polynomial orders in the different coordinate directions are selected for 


o 


№ 
= 


ко 


(а) (b) 
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Fig. 1 (a) Truncation error map for a specific element that shows log | e. E as a function of 
i oo 


№ and № (the black box shows the limit between the estimated and extrapolated maps). (b) Map 


of degrees of freedom (the black boxes show the polynomial orders that achieve | г. 1 
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each element, such that a truncation error threshold Tmax is achieved with the 
minimum NDOF possible, as illustrated in Fig.1b. In the simulations shown 
in this paper, the reference representation, q^, is converged to a residual 
Ттах/ 10 before the p-adaptation stage, so that the truncation error is accurately 
estimated down to Tmax, as was shown necessary by Kompenhans et al. 
[10]. 


3 Flow Past a NACA0012 Airfoil 


In this section, we compare the performance of the proposed p-adaptation multi- 
grid scheme with a uniformly adapted p-multigrid method (without local p- 
adaptation) and a uniformly adapted RK3 method when solving the steady viscous 
flow past а NACAO0012 airfoil at angle of attack 5°, Ве» = 200 (Leo = 
Lehord) and Moo = 0.2. This particular settings correspond to a steady laminar 
flow, but the proposed method can be directly used with any steady solution 
(e.g. RANS). An unstructured mesh of 2011 quadrilateral elements is employed 
(Fig. 2). 

In the cases where multigrid is employed, the RK3 scheme is used as the 
iterative method (smoother), so that additional speed-ups are only due to the 
methods exposed in Sect.2. As in [18], a residual-based smoothing strategy is 
performed. The minimum number of smoothing sweeps is В = 200 for the coarsest 
multigrid level (№ = 1) and В = 50 for any other level. After every В pre- 
smoothing sweeps, the residual in the next (coarser) representation is checked. If 


[к^ | < 12 (к^ » | ‚ the pre-smoothing is stopped; otherwise, В additional 
оо оо 


Fig. 2 Pressure contours of the flow past a NACAO0012 at angle of attack 5? 
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sweeps are performed. Similarly, the norm of the residual after the post-smoothing 


15 forced to be at least as low as it was after the pre-smoothing, A < 
oo 
| S. . If that condition is not fulfilled, additional В sweeps are taken until it 
oo 


is. 

The isolated truncation error estimate is used to drive the p-adaptation method 
since it has been shown to provide better results than the non-isolated one [17— 
19]. The conservative form (Eq. (1)) of the compressible Navier-Stokes equations is 
discretized using the Discontinuous Galerkin Spectral Element Method (DGSEM) 
[3, 12], which is a nodal (collocation) version of a DG method that uses Gauss points 
as the solution nodes and quadrature points, obtaining diagonal mass matrices. 
However, the methods that are exposed here can be applied to any DG scheme with 
tensor-product basis functions. 

In [18] it was explained that, when using the DGSEM in general 3D curved 
meshes and p-nonconforming representations, the order of the mapping must be 
at most M < NJ/2 for the numerical representation to be free-stream preserv- 
ing. For this reason, the use of a conforming algorithm was proposed, which 
forces the polynomial orders to be conforming in the first layer of elements on 
a curved boundary. The use of a conforming algorithm is necessary to retain 
the well-known M < М condition of the DGSEM [11]. In this work, we 
use the conforming algorithm on the airfoil surface since it showed to produce 
better results, although its use is not imperative as the considered test case is 
2D. 

For the uniformly adapted cases, the polynomial order is varied between N — 2 
and N = 7. For the cases with local p-adaptation, a single-stage anisotropic 
p-adaptation procedure is performed, and the minimum polynomial order after 
adaptation is set to Nmin = 1, whereas the maximum polynomial order after 
adaptation is set to Nmax = 7. The relative drag and lift errors of the adapted meshes 
are assessed by comparing with a reference solution of order N — 8: 


ыы (19) 

rag a= > lift oS 

Figure 3 shows a comparison between the errors obtained using the t-based 
adaptation procedure and the ones using uniform p-refinement. As can be observed, 
the number of degrees of freedom is substantially reduced for the same accuracy 
when using the ?-based p-adaptation. This reduction translates into a reduction 
of the CPU-times. It is interesting to point out that, as the isolated truncation 
error threshold Тиах is decreased, the polynomial orders of the mesh tend to 
the maximum specified polynomial order, Nmax = 7. Consequently, the lift and 
drag coefficients also tends to cn —7. Using Fig.3, it is possible to compute 
a speed-up for different levels of accuracy. Table 1 summarizes the speed-up 
calculations for the maximum level of accuracy that was achieved for the drag and 
lift coefficients. 
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Fig.3 Relative error in the drag and lift coefficients for different methods for the flow past the 
МАСА0012 airfoil. The blue lines represent uniform refinement, and the red lines represent the 
t-based p-adaptation procedure with Nmax = 7. (a) Drag error vs. DOFs; (b) lift error vs. DOFs; 
(c) drag error vs. CPU-time; (d) lift error vs. CPU-time 


Table 1 Computation times and speed-up for the different methods after converging until |г| хо < 


Lift coefficient (ej, < 2.4 x 1075) 
Method Speed-up 
RK3 1.00 
FAS 12.10% | 8.26 


FAS + p- 


1.21 x 10° 6.20% | 16.13 1.48x 10° 7.58% | 13.19 
adaptation 


Figure 4 shows the distribution of polynomial orders after the single-stage 
adaptation procedure for a threshold of Tmax = 5 x 1074, which has related errors 
обеде = 4.10 x 107? and ej = 7.31 x 1075. As can be observed, the 
elements that are enriched are mainly the ones on the boundary layer (specially 
leading and trailing edge), and the zones of the wake where the element size changes 


significantly. 
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Fig. 4 Polynomial order distribution after the anisotropic p-adaptation. Мене = (№ + N2)/2 


4 Conclusions 


In this work, we have applied recently developed error estimators and anisotropic 
p-adaptation methods in conjunction with multigrid solving strategies for solving 
the compressible Navier-Stokes equations. In particular, we have shown that 
the coupling of anisotropic truncation error-based p-adaptation methods with p- 
multigrid schemes can speed up the computation of steady-state solutions of PDEs. 
The achieved speed-up depends on the desired accuracy, being this method optimal 
when high accuracy is required (low errors). In particular, a speed-up of 16.13 
was achieved for the computation of the steady compressible viscous flow past 
a NACAO0012 airfoil at angle of attack 5? with respect to the uniformly adapted 
representation without multigrid. 
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A Spectral Element Reduced Basis A 
Method for Navier-Stokes Equations PEE 
with Geometric Variations 


Martin W. Hess, Annalisa Quaini, and Gianluigi Rozza 


1 Introduction and Motivation 


Spectral element methods (SEM) use high-order polynomial ansatz functions to 
solve partial differential equations (PDEs) in all fields of science and engineering, 
see, e.g., [4—7, 12, 16] and references therein for an overview. Typically, an 
exponential error decay under p-refinement is observed, which can provide an 
enhanced accuracy over standard finite element methods at the same computational 
cost. In the following, we assume that the discretization error is much smaller than 
the model reduction error, small enough not to interfere with our results. In general, 
this needs to be established with the use of suitable error estimation and adaptivity 
techniques. 

We consider the flow through a channel with a narrowing of variable height. 
A reduced order model (ROM) is computed from a few high-order SEM solves, 
which accurately approximates the high-order solutions for the parameter range 
of interest, i.e., the different narrowing heights under consideration. Since the 
parametric variations are affine, a mapping to a reference domain is applied without 
further interpolation techniques. The focus of this work is to show how to use 
simulations arising from the SEM solver Nektar++ [3] in a ROM context. In 
particular, the multilevel static condensation of the high-order solver is not applied, 
but the ROM projection works with the system matrices in local coordinates. See 
[12] for further details. This is in contrast to our previous work [8], since numerical 
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experiments have shown that the multilevel static condensation is inefficient in a 
ROM context. Additionally, we consider affine geometry variations. With SEM as 
discretization method, we use global approximation functions for the high-order as 
well as reduced-order methods. The ROM techniques described in this paper are 
implemented in open-source project ITHACA-SEM.! 

The outline of the paperis as follows. In Sect. 2, the model problem is defined and 
the geometric variations are introduced. Section 3 provides details on the spectral 
element discretization, while Sect. 4 describes the model reduction approach and 
shows the affine mapping to the reference domain. Numerical results are given in 
Sect. 5, while Sect. 6 summarizes the work and points out future perspectives. 


2 Problem Formulation 


Let Q € R? be the computational domain. Incompressible, viscous fluid motion 
in spatial domain €2 over a time interval (0, 7) is governed by the incompressible 
Navier-Stokes equations with vector-valued velocity u, scalar-valued pressure p, 
kinematic viscosity v and a body forcing f: 


ди 
еи (1) 


У.и = 0. (2) 


Boundary and initial conditions are prescribed as 


u=d onUIpx (0, Т), (3) 
Vu.n—g only x (0, T), (4) 
и=ш inQ x90, (5) 


with d, g and uo given and 952 = Гр Ч Гм, Гр П Гм = Ø. The Reynolds number 
Re, which characterizes the flow [11], depends on v, a characteristic velocity U, 
and a characteristic length L: 


Re = —. (6) 
v 


We are interested in computing the steady states, i.e., solutions where ou 


vanishes. The high-order simulations are obtained through time-advancement, while 
the ROM solutions are obtained with a fixed-point iteration. 


'https://github.com/mathLab/ITHACA-SEM. 
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2.1 Oseen-Iteration 


The Oseen-iteration is a secant modulus fixed-point iteration, which in general 
exhibits a linear rate of convergence [2]. Given a current iterate (or initial condition) 
иќ, the next iterate u‘t! is found by solving linear system: 


ули! + (u* . Уи! +Vp=finQ, 
V.u**! = 0ing, 
wt! = а on Гр, 


Vu*!.n-g оп Гу. 


Iterations are typical stopped when the relative difference between iterates falls 
below a predefined tolerance in a suitable norm, like the L?(Q) or Hi (2) norm. 


2.2 Model Description 


We consider the reference computational domain shown in Fig. 1, which is decom- 
posed into 36 triangular spectral elements. The spectral element expansion uses 
modal Legendre polynomials of the Koornwinder-Dubiner type of order p — 11 
for the velocity. Details on the discretization method can be found in chapter 3.2 
of [12]. The pressure ansatz space is chosen of order p — 2 to fulfill the inf-sup 
stability condition [1, 20]. A parabolic inflow profile is prescribed at the inlet (i.e., 
x = 0) with horizontal velocity component и; (0, y) = y(3 — y) for y є [0, 3]. At 
the outlet (i.e., x — 8) we impose a stress-free boundary condition, everywhere else 
we prescribe a no-slip condition. 

The height of the narrowing in the reference configuration is и = 1, from 
у = lto = 2. See Fig. 1. Parameter u is considered variable in the interval 
ИЕ [0.1, 2.9]. The narrowing is shrunken or expanded as to maintain the geometry 
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H 
л 
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Fig. 1 Reference computational domain for the channel flow, divided into 36 triangles 
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Fig. 2 Full order, steady-state solution for и = 1: velocity in x-direction (top) and y-direction 
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Fig. 3 Full order, steady-state solution for и = 0.1: velocity in x-direction (top) and y-direction 
(bottom) 


symmetric about line y — 1.5. Figures 2, 3, and 4 show the velocity components 
close to the steady state for и = 1, 0.1, 2.9, respectively. 

The viscosity is kept constant to v — 1. For these simulations, the Reynolds 
number (6) is between 5 and 10, with maximum velocity in the narrowing as 
characteristic velocity U and the height of the narrowing characteristic length L. 
For larger Reynolds numbers (about 30), a supercritical pitchfork bifurcation occurs 
giving rise to the so-called Coanda effect [8, 9, 22], which is not subject of the 
current study. Our model is similar to the model considered in [17, 18], i.e. an 
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Fig. 4 Full order, steady-state solution for и = 2.9: velocity in x-direction (top) and y-direction 
(bottom) 


expansion channel with an inflow profile of varying height. However, in [18] the 
computational domain itself does not change. 


3 Spectral Element Full Order Discretization 


The Navier-Stokes problem is discretized with the spectral element method. The 
spectral/hp element software framework used is Nektar++ in version 4.4.0.2 The 
discretized system of size Ns to solve at each step of the Oseen-iteration for fixed u 
can be written as 


A = Юм В Vbnd fond 
E Dona 0 —Dint p = 0 , (7) 
В д -D}, С Vint fint 


where ура and Vint denote velocity degrees of freedom on the boundary апа in 
the interior of the domain, respectively, while p denotes the pressure degrees of 
freedom. The forcing terms on the boundary and interior are denoted by ид and 
fint, respectively. The matrix A assembles the boundary-boundary coupling, В the 
boundary-interior coupling, B the interior-boundary coupling, and C assembles 
the interior-interior coupling of elemental velocity ansatz functions. In the case 
of a Stokes system, it holds that В = ВТ, but this is not the case for the 
Oseen equation because of the linearized convective term. The matrices Dpna 


2See www.nektar.info. 
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and Ди: assemble the pressure-velocity boundary and pressure-velocity interior 
contributions, respectively. 

The linear system (7) is assembled in local degrees of freedom, resulting in 
block matrices A, B, B, C, Dbna and Di,;, each block corresponding to a spectral 
element. This allows for an efficient matrix assembly since each spectral element is 
independent from the others, but makes the system singular. In order to solve the 
system, the local degrees of freedom need to be gathered into the global degrees of 
freedom [12]. 

The high-order element solver МеКїаг++ uses a multilevel static condensation 
for the solution of linear systems like (7). Since static condensation introduces 
intermediate parameter-dependent matrix inversions (such as C7! in this case) 
several intermediate projection spaces need to be introduced to use model order 
reduction [8]. This can be avoided by instead projecting the expanded system (7) 
directly. The internal degrees of freedom do not need to be gathered, since they 
are the same in local and global coordinates. Only ansatz functions extending over 
multiple spectral elements need to be gathered. 

Next, we will take the boundary-boundary coupling across element interfaces 
into account. Let M denote the rectangular matrix which gathers the local boundary 
degrees of freedom into global boundary degrees of freedom. Multiplication of the 
first row of (7) Бу МТМ will then set the boundary-boundary coupling in local 
degrees of freedom: 


MTMA —M' MD}; M' MB | | Vind МТ Mfpna 
= Dona 0 —Din: p = 0 А (8) 
ВТ -DT C Vint fint 


The action of the matrix in (8) on the degrees of freedom on the Dirichlet 
boundary is computed and added to the right hand side. Such degrees of freedom 
are then removed from (8). The resulting system can then be used in a projection- 
based ROM context [13], of high-order dimension № x № and depending on the 
parameter u: 


Я(ш)х(и) =f. (9) 


4 Reduced Order Model 


The reduced order model (ROM) computes accurate approximations to the high- 
order solutions in the parameter range of interest, while greatly reducing the overall 
computational time. This is achieved by two ingredients. First, a few high-order 
solutions are computed and the most significant proper orthogonal decomposition 
(POD) modes are obtained [13]. These POD modes define the reduced order 
ansatz space of dimension N, in which the system is solved. Second, to reduce 
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the computational time, an offline-online computational procedure is used. See 
Sect. 4.1. 

The POD computes a singular value decomposition of the snapshot solutions 
to 99.99% of the most dominant modes [10], which define the projection matrix 
U € R?*N used to project system (9): 


UT A(u)Uxy (u) = Uff. (10) 


The low order solution xy (u) then approximates the high order solution as 
x(u) ~ Uxy (и). 


4.1 Offline-Online Decomposition 


The offline-online decomposition [10] enables the computational speed-up of 
the ROM approach in many-query scenarios. It relies on an affine parameter 
dependency, such that all computations depending on the high-order model size 
can be moved into a parameter-independent offline phase, while having a fast input- 
output evaluation online. 

In the example under consideration here, the parameter dependency is already 
affine and a mapping to the reference domain can be established without using an 
approximation technique such as the empirical interpolation method. Thus, there 
exists an affine expansion of the system matrix A(n) in the parameter и as 


Q 
AW = 9 OWA. (11) 


i=l 


The coefficients ©; (u) are computed from the mapping x = 7Тк(и)й + gr. 
Т, € 2х2, gk Е R?, which maps the deformed subdomain б to the reference 
subdomain Qg. See also [19, 21]. Figure 5 shows the reference subdomains Ор for 
the problem under consideration. 

For each subdomain Qi the elemental basis function evaluations are transformed 
to the reference domain. For each velocity basis function u = (u1, их), V = 
(01, 02), w = (w1, w2) and each (scalar) pressure basis function y, we can write 
the transformation with summation convention as: 


90. 99 - да à 
[ Se № f M ЖАНТУ 
[en OX; Ox Xj Ок OX; Ox j 


^ Xe RA ди} 
_ фу-йабк= | ух, 494, 
Әр Qk Ox; 


x A AA ди; 
V dà, = | ujTij — WdQy, 
Qi Qk | j 
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Fig. 5 Reference computational domain with subdomains © (green), Q2 (yellow), Әз (blue), 24 
(grey) and $25 (brown) 


with 
vij = Tyr dy y Tjj det(T)- 5, 


хи = mij = Tij det(T) |. 


The subdomain 525 (see Fig. 5) is kept constant, so that no interpolation of the 
inflow profile is necessary. To achieve fast reduced order solves, the offline-online 
decomposition expands the system matrix as in (11) and computes the parameter 
independent projections offline, which are stored as small-sized matrices of the 
order N x N. Since in an Oseen-iteration each matrix is dependent on the previous 
iterate, the submatrices corresponding to each basis function are assembled and 
then formed online using the reduced basis coordinate representation of the current 
iterate. This is the same procedure used for the assembly of the nonlinear term in 
the Navier-Stokes case [13]. 


5 Numerical Results 


The accuracy of the ROM is assessed using 40 snapshots sampled uniformly over 
the parameter domain [0.1, 2.9] for the POD and 40 randomly chosen parameter 
locations to test the accuracy. Figure 6 (left) shows the decay of the energy of 
the POD modes. To reach the typical threshold of 99.99% on the POD energy, it 
takes 9 POD modes as RB ansatz functions. Figure 6 (right) shows the relative 
L?(Q) approximation error of the reduced order model with respect to the full order 
model up to 6 digits of accuracy, evaluated at the 40 randomly chosen verification 
parameter locations. With 9 POD modes the maximum approximation error is less 
than 0.7% and the mean approximation error is less than 0.5%. 

While the full-order solves were computed with Nektar++, the reduced-order 
computations were done in ITHACA-SEM with a separate python code. To assess 
the computational gain, the time for a fixed point iteration step using the full- 
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Fig. 6 Left: Decay of POD mode energy. Right: Maximum (red) and mean (blue) relative L?(Q) 
error for the velocity over increasing reduced basis dimension 


order system is compared to the time for a fixed point iteration step of the ROM 
with dimension 20, both done in python. The ROM online phase reduces the 
computational time by a factor of over 100. The offline time is dominated by 
computing the snapshots and the trilinear forms used to project the advection terms. 
See [13] for detailed explanations. 


6 Conclusion and Outlook 


We showed that the POD reduced basis technique generates accurate reduced order 
models for SEM discretized models under parametric variation of the geometry. 
The potential of a high-order spectral element method with a reduced basis ROM 
is the subject of current investigations. See also [6]. Since each spectral element 
comprises a block in the system matrix in local coordinates, a variant of the reduced 
basis element method (RBEM) [14, 15] can be successfully applied in the future. 
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Iterative Spectral Mollification A 
and Conjugation for Successive Edge ER 
Detection 


Robert E. Tuzun and Jae-Hun Jung 


1 Introduction 


Detection of edges is a fundamental problem in a variety of applications, including 
image processing and the numerical solution of differential equations. In applica- 
tions such as magnetic resonance imaging (MRI), it is required to construct images 
from Fourier data. Let { f [К = 0, +1, +2, ---} be the set of Fourier coefficients of 
f(x) € L2[—7, x] given by 


ДЕ [годе ах, 


and let fy be the Fourier partial sum fy = ya N fhe. When the underlying 
function is smooth and periodic, the Fourier reconstruction fy is accurate to spectral 
accuracy, but when edges are present, the reconstruction is plagued by the Gibbs 
phenomenon, also known as the Gibbs ringing in MRI applications. 

Various methods have been proposed to address these issues and those methods 
consist of edge detection followed by reconstruction. Thus, the determination of 
edge locations is critical. Fourier concentration method has emerged over the past 
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decade as a robust method for edge detection in a variety of circumstances and 
applications [5, 6]. Essentially, a certain Fourier partial sum converges to the jump 
function as the number of Fourier coefficients increases and this convergence 
can be accelerated by what is known as concentration factors (functions). Use 
of different types of concentration factors tends to impart trade-offs between 
oscillations near jump discontinuities and significant non-zero concentration away 
from them [2]. Several methods have been devised to address this issue, as well as 
to treat special circumstances such as incomplete Fourier data and the presence of 
noise [1, 4, 13, 15]. 

Thanks to the convergence property of the Fourier concentration to the jump 
function, the concentration method detects edges with large concentrations. Where 
the function is smooth, the concentration vanishes as the jump function vanishes as 
N — оо. In practice, the concentration method is designed to detect edges with 
magnitudes larger than some given threshold, with the value of the used threshold 
being problem dependent. The value of the threshold cannot be arbitrarily small; 
otherwise, too many false edges can be detected. If the magnitude of weak edges 
is much smaller than other edges, those edges are considered insignificant, but 
for some cases weak edges are more important than strong edges. For example, 
it was shown that in the segmentation of MRI of the knee, the cartilage is better 
characterized by weak edges rather than strong edges for the separation from the 
tibia and femur [11]. 

This note shows that an iterative approach based on the successive conjugation 
and adaptive mollification can detect all edges without any prior threshold. This 
approach is similar to the iterative method in the context of the radial basis 
function method [3, 9, 10]. The iterative method is as follows: at each iteration 
step, all previously found edges are smoothed by a local mollification and new 
corresponding Fourier coefficients are computed. By applying conjugation and 
mollification successively, one can distinguish real edges from fake edges. This 
approach is useful and effective particularly for problems where the weak jump 
can significantly affect the global solution of differential equations or images where 
the interesting structure is represented by the weak edges [11]. 

In Sect. 2, a brief explanation of the Fourier concentration method is given. In 
Sect. 3, the proposed iterative method is explained based on the adaptive filtering 
method. The stopping criteria is also explained. Numerical examples with remarks 
are given. In Sect. 4, a brief concluding remark is provided. 


2 Edge Detection Using Fourier Concentration Method 


Let [f](x) = ft) — f(x ) denote the jump function of f(x) € Г[2[-л,л], 
where the superscripts + and — denote the limits taken from the right and 
left, respectively. Given a finite set of Fourier coefficients, ( №} км, the Fourier 
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concentration method, developed in [5, 6], computes the concentration as a sum of 
the form 


^ k | 
SNLÉ1G0 E b» sgn(k) feo (2) v (1) 


N 
«|< М 


where the o (-) are known as concentration factors and sgn(k) is the sign function. 
Given certain admissibility conditions [6], the sum converges to the jump function: 


о(©“\, а 
SKLA) = ЛО) + ( » T 
О (туруу). dx) > м 


(2) 
where d(x) denotes the distance to the nearest edge and s depends on the 
concentration factor. Here we note that Eq. (2) shows that the concentration function 
SN Lf 1G) recovers the jump function of f(x) as N — оо and the convergence 
may be slow. Equation (2) also implies that the absolute maximum value of the 
concentration function 5% [ 7 (х) converges to the maximum jump. Accordingly 
we observe that strong jumps are relatively easier to detect than weak jumps. The 
common types of concentration factors satisfying the admissibility conditions are 
polynomial concentrations 


a(n) = рп’, pzk (3) 
where p is a positive integer and 7 = |k|/N and exponential concentration functions 
e Gp = Спе т т, (4) 


where a > 0 15 an order and C is a normalization constant. Cutoffs for edge 
detection, t € (0, 1], are with respect to the normalized concentration 


ŜO) = 1556 тах (155 [716 


and the edge set, E, is defined as 
Е = {x|S(x) > t,x € [-л,л]. (5) 


Several approaches have been developed for improving the concentration method. 
We refer readers some to [1, 2, 4, 6, 7, 12, 13, 15, 16]. All these methods are basically 
utilizing the edge map. Figure 1 is by the Fourier concentration method for ў (x) 


JU JU 

3 45953 

—2 n<x< 40 

= — 4 
fi(x) = 6 
ЛО) Di z ed (6) 


0 otherwise 
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Fig. 1 Edge detection for fı(x) with t = 0.4. (a and b) with the polynomial concentra- 
tion with р = 5. (c and d) with the exponential concentration with p = 12 foro = 


p 
exp E (: — sl) and €y = 64. (a and c) S(y). (b and d) the detected edges marked 


by red cross symbols. N = 128. The red cross in (b and d) indicates the edge locations found 


where ти, = 0.1. That is, the magnitude of the strong jump is 30 times the weakest. 
As clearly shown in the figure, it is hard to detect the weak edge by looking at 
the normalized Fourier concentration $ (y), (a) and (c) in Fig. 1, although the weak 
edges are clearly visible in (b) and (d) in Fig. 1. 
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3 Iterative Concentration Method 


As clearly seen in Fig. 1, the Fourier concentration method may fail to detect the 
weak edge when the concentration of the weak edge is too small compared to the 
strong edge. To find all edges, we propose to apply the Fourier concentration method 
iteratively based on the local mollification using the local adaptive filtering method. 


3.1 Local Adaptive Mollification 


The local adaptive mollification is a key step for the iterative algorithm. Consider a 
smooth function ф € Cj°[0, 27r] which is compactly supported such that 


1 X 
фе(х) = —ф (=) , (7) 
є є 


where lim, 0+ фе(х) = (х). Неге ó(x) is the Dirac delta function. And further 
f oo P(x)dx = 1. With these properties, the limit property is given by 


Im. (de ж f(x) = f(0), 


where (ж) operation denotes convolution. The parameter e is free and it localizes 
the convolution and is known as the localization factor. The parameter e is a fixed 
value for every x. Thus a global smoothing occurs everywhere including both the 
nonsmooth and smooth areas. However, we only want to apply the mollification 
locally to minimize the Gibbs oscillations near the jump. In order to achieve this, 
we use a two-parameter family of the spectral mollifier introduced by Gottlieb and 
Tadmor [8]. Consider the convolution of the Fourier partial sum fw(x) and the 
mollifier $. Then by the definition of fy (x) and $ we have 


1 2л 
(фе ж fy) = nr $c (x — y) fu (y)dy 
1 2л 
zr фе (х — y)(Dn(y — 2) ж f(z) (y)dzdy, 
л Jo 


where Ом is the Dirichlet kernel of degree N. The idea proposed in [8] is that one 
changes the degree of the Dirichlet kernel with the localization parameter e so that 
the two-parameter family of the new mollifier is defined by 


1 х х 
фр,є(х) = z’ Ө Dp (=) , (8) 
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where D, is the Dirichlet kernel of degree p. Then for all s, the error is given by [8] 


1 5+1 2 5 
[фь,е * уб) — РО) < Clo 1 |" (=) +p (5) irg] ; 
€ p 
(9) 


where ||. [ао = Sup; es yen) |: |. The first term in the right hand side of the 
above inequality is the truncation error and the second term is the regularization 
error. As we see, the optimization of the error is determined by how the localization 
parameter е and the degree p are balanced. In [8] those parameters were chosen 
such that € = а, where d is Һе distance to the nearest jump from the current 
position. The order of the Dirichlet kernel is chosen such that spectral convergence 
is achieved, say, р = \/N. Here we note that a modification of the two-parameter 
mollifier for the enhancement of the convergence was proposed in [14], which was 
designed to reduce the Gibbs oscillations while it provides a sharp reconstruction up 
to the edge. Note that the adaptive mollifier was used to sharpen the concentration 
map Sin [2]. 

Our proposed iterative method is that once the edge is identified, the edge region 
is first localized using the value of є so that fy in the region away from the detected 
edge is not affected by the mollification. This helps the next available edge to be 
preserved through the mollification of fy if existent. Thus as in [8], the localization 
factor is a function of the distance from the edge, d, i.e. € = e(d). Then we 
adaptively mollify fw so that a heavy mollification using p is applied to reduce 
the Gibbs oscillations near the edge. The limit property of p is given as p — O if 
d — 0 and p — oo if d — 2л. In this work, we use the local adaptive filtering for 
the mollification. 


3.2 Almost Automatic Stopping of the Iteration 


To see the proposed method stops almost automatically, consider f(x) = x,x € 
[-л, x | with the Fourier coefficients 


= 0, k#0 


апа fo = 0. There аге two edges (x = +77) and conjugation and local adaptive 
filtering have the most effect at +z. Therefore, by considering the local behavior 
near +77, we assume a constant order of filtering р and of conjugation q, with 
functional forms of ехр(—ем(1 — |k/N|)?) and exp(—em|k/N|7), respectively. 
By letting @p,< and Су, be the corresponding kernels and letting 5 and F denote 
conjugation and filtering, 


Су ж (Qp,  fN) © Фр, ж (Су ж Ју) 
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and after some simplification, we have 


N (—1)*t! ikl р 
УТРУ = Сре = У exp (1-5) 
k=—N,k£0 


x exp(—e|k/N |De. 


This has Fourier coefficients 


$ (nre кү? 
SIFILA]  —,— ep | —є (: = x) exp(—e|k/N|*). 


From the sharp localization and heavy filtering near Ел, we choose p,q —> 0. 
Then setting y = |k/N| yields 


P 1 
ISLFLfNTIL ~ z PEO) +(1—y)?], k#0 


which approaches O exponentially. Thus after all the edges are found through 
iteration, the concentration decays exponentially small. Thus if the stopping criteria 
п below is chosen small enough, e.g. n ^ 10—10, the stopping of the iteration is 
guaranteed 


ISLFLfNMI < m. (10) 


3.3 Numerical Examples 


We consider the case that the magnitude of the weak edge is highly small for the 
function f; (x) in Eq. (6) 


my = 0.01. 


Figure 2 shows how the iteration method finds edges. The order of the finding the 
edges is from left to right (see red arrows). As shown in the figure, the iterative 
method finds all edges even with m,, = 0.01. It is interesting to observe that the 
weakest edges are found in the 3rd and 7th iteration steps before all the strong edges 
are found. Now we consider even smaller value of my 


my, — 0.001. 
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Fig. 2 р(х) with mw = 0.01 and successive edge detection. Left: $ (y). Right: edges found in 
each iteration marked by red cross symbols with the weak edges circled in green. Note that the 
weak edges are almost invisible in the right figure of fı (х). Each figure in the left shows $ (y) after 
each iteration from left to right. Each figure in the right shows the actual function and detected 
edges, from left to right 


Figure 3 shows how the iteration method finds all edges. Figure 3 shows similar 
result as in Fig. 2. As shown in the figure, the method is highly accurate and finds 
all edges including the highly weak edges. 

As an application to the solution of PDEs, namely the shock-density wave 
interaction equation, we consider finding shocks in the density profile at 1 = 2 with 
the total number of grid points N — 300 computed with the WENO-Z method used 
in [9]. The left two figures of Fig. 4 show the edges (shocks) found by the Fourier 
concentration method while the right figure shows the edges (shocks) found by the 
iterative method. As shown in the figure, the iterative method find all the physical 
shocks accurately while the Fourier concentration method misses some of shocks. 

For two-dimensional examples, we consider a Shepp-Logan image with a faint 
box added to comprise additional weak edges, and a brain image. To detect edges in 
two dimensions, edges are detected slicewise in the x and y directions. The x and y 
coordinates have a range of [—z, л]. Fora2N,+1x2N,+1 image, slices of f (x, y) 
are taken at evenly spaced x and y with Ax = 2z/(2N, + 1) and Ay = 2z/(2N, + 
1) with —л included and л excluded. Within each slice, Fourier coefficients 
are computed by partial Fourier expansion and the iterative method is applied 
to find strong and weak edges. Calculation parameters for the two-dimensional 
calculations were similar to those for the one-dimensional calculations. An edge 
with a concentration magnitude at or above a fraction т = 0.1 of the maximum 
magnitude concentration was considered strong. To detect strong edges, trigono- 
metric concentration factors with о = л were used. Figure 5 shows the edges found 
by the proposed method for the Shepp-Logan image. As in the figure, the weak 
edges (square box with magnitude of 0.01) are successfully found by the method. 


Remarks First, the proposed method is affected by noise as the original Fourier con- 
centration. Consider Eq. (6). Let f (my = 0) be the Fourier coefficients with my = 
0. Then (ть) = Mw /8 for k = 0 and f (My) = Tur [sin(7kz/4) — sin(3kz/2)] 
for К 5 0. The weak edge translates fre Thus we expect that unless SN 1$ high 
enough, | Ж (ть = 0) – Ж (ть) becomes easily smaller than the noise as my 
decays. Figure 6 shows the concentration with m, = 0 (left), the concentration 


with SNR = 20 (middle) and with SNR = 10. As in the figure, the weak edges are 
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Fig. 4 The density р in y-axis versus the x-coordinate in x-axis at t = 1 for the shock-density 
wave interaction and shocks (with cross symbols) found with different values of t and p. Left two 
figures: the Fourier concentration method. Right: the iterative method 
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Fig. 5 Original image (left) and edges (right) detected for concentration followed by iterative 
method on the Shepp-Logan image with a weak square edge of magnitude 0.01 added 
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Fig. 6 Concentration in y-axis versus x. Left: without noise. Middle: SN R=20. Right: SN R=10 


easily indistinguishable as SN R decreases. As the main objective of this research is 
finding the weak edges, a proper noise reduction suitable for the proposed method 
should be investigated in our future research. Second, the proposed method is to find 
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Fig. 7 Left: edges with finite difference in the physical domain. Right: x <x< m 


the edges by А. Figure 7 shows the edge detection in ће physical domain, with the 
forward difference, generated from the Fourier data (the left figure). The right figure 
shows the plot in x € x <х < m where the weak edges exist. As in the figures, 
the weak edges are still hard to distinguish in the physical domain. Once the strong 
edges are removed in the Fourier domain and switching back and forth from the 
Fourier to physical domains, the weak edges are eventually found with the proposed 


method. 


Summary The following is the summary of the proposed iterative concentration 
method. The procedure stops eventually with a non-zero value of > Ош 
Eq. (10). 


* Step 1: Find edge locations хо using the Fourier concentration method. 

* Step 2: Apply the local filter near хо and find the new set of Fourier coefficients. 

* Step 3: Find a new edge location у, where the normalized concentration $ Бу 
{ А) from Step 2 has ће maximum. 

* Step 4: Repeat Steps 2 and 3 until all the edges are found (the iteration stops once 
all edges are found.) 


4 Conclusion 


We showed that the iterative approach of the Fourier concentration method can 
detect all edges, which is not the case if the weak edges are too small. We showed 
that the proposed method is able to detect weak edges 3000 times weaker than the 
strongest edge, as long as the weak edges are well-separated from the stronger 
edges without noise and that the proposed method find all weak edges in a PDE 
application, namely the WENO calculation for the shock-density wave interaction. 
The iterative method also shows that it stops almost automatically after all the edges 
are found. Thus the proposed method is accurate and efficient. 
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Small Trees for High Order Whitney A) 
Elements quito 


Ana Alonso Rodríguez and Francesca Rapetti 


1 Introduction 


We aim at determining in a constructive way, for the high order case, the finite 
element solutions of grad $ = E, curl A = B, div = p, namely, of the equations 
linking the electric field E, the magnetic induction B, and the electric charge density 
р, to their potentials $, А and D, respectively. Stating the necessary and sufficient 
conditions for assuring that a function defined in a bounded set (2 C R? is the 
gradient of a scalar potential, the curl of a vector potential or the divergence of a 
vector field is one of the most classical problem of vector analysis (see for example 
[3, 6, 8]). We aim at providing an explicit and efficient procedure to construct a 
finite element solution. For example, div-free fields, W, are implicitly characterized 
in terms of a vector w of degrees of freedom of W by the algebraic constraint 
Dw — 0, with D the matrix of the div operator between finite elements spaces. 
The same fields, in the case of a domain with connected boundary, are explicitly 
defined by w — Ra, with no constraint on a, where R is the matrix of the curl 
operator between finite elements spaces and a collects the degrees of freedom of the 
vector potentials A. Similarly, one can wish to compute a vector potential a such 
that Ra — b, for a given field b verifying Db — 0. As explained in [5], these bases 
can be constructed by the help of “trees” and “co-trees”, which are at the core of 
this contribution. The case г = 0 is largely treated in the literature for different 
types of topological domains (see for example [2]). In these pages, we develop the 
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tree and co-tree approaches for r > 0 when fields in the high order Whitney spaces 
are represented on the basis of their weights on small simplices [7, 9, 10]. With this 
choice of degrees of freedom, the tree and co-tree concepts extend from r = 0 to 
r > Ostraightforwardly. 


2 Basic Concepts 


Let 2 C R? be a bounded polyhedral domain with Lipschitz boundary 82 and 
М а simplicial mesh of 2. We denote by |A| the cardinality of the set A. For 
0 x k x 3, let A(T) (resp. Ax(M)) be the set of k-simplices of a mesh tetrahedron 
T (resp. of the mesh M). Note that Ак (Л) = Urey A&CT). If Ao(M) = (vili, 
with i = 1,..., Ny, being № = |Ao(M)|, then each k-simplex 5 є Ax(M) has 
associated an increasing map ms : {0,..., k) — {1,..., Ny}. This map induces 
an (inner) orientation on 5 (i.e., a way to run along S if k = 1, through S if k = 2, 
in S if k = 3). 

If we assign to each 5 € A(M) a real number cs we can define the k-chain 
с = 2 Se лом) cs S, i.e. a formal weighted sum of k-simplices 5 in М. One can 
add k-chains, namely (c + с) = s(cs + Cs) S, and multiply a k-chain by a scalar 
p, namely pc = 5 's(pcs) S. The set of all k-chains in M, here denoted Cy CM), 
is a vector space, in one-to-one correspondence with the set of real vectors c — 
(Cs) елм): Each k-simplex 5 € Дк (МИ), can be associated with the elementary 
k-chain c with entries cs = 1 and с = 0 for S # S. In the following we will use 
the same symbol S to denote the oriented k-simplex and the associated elementary 
k-chain. 

The boundary operator д takes a k-simplex 5 and returns the sum of all its 
(k — 1)-faces f with coefficient 1 or —1 depending of whether the orientation 
of the (k — 1)-face f matches or not with the orientation induced by that of the 
simplex 5 on f. Since the boundary operator is a linear mapping from Cj (Mt) to 
Ci 1 CM, it can be represented by a matrix д of dimension | Ax-.1 (M)| x | Ак (ЛА), 
which is rather sparse, gathering the coefficients 0, —1, ог +1. Note that in three 
dimensions, there are three nontrivial boundary operators acting, respectively, on 
edges, triangles and tetrahedra: ду represented by the matrix G', дә represented by 
R!, and дз represented by D! . To fully specify д, we need to specify the boundary 
of each simplex S. By definition, we have 


д1е = у, бепп, dof = у, Вуее, BT = у, От, у f. 


пєДо(М) ecAı (M) f€A2CM) 


for any e € A1CMD, any f € A2(MD and any T € A3(M). For e = [vo, vı], 
f = [vo. V1, V2] and T = [vo, V1, V2, Уз], we have, respectively, 


91 [У0, V1] = vo — vi, 92 [У0, V1, V2] = [vo. V1] — [vo. v2] + [У1, v2], 
93 [У0, V1, V2, Уз] = [Vo, УІ, V2] — [Vo, V1, V3] + [Vo, V2, Уз] — [\1, V2, Уз]. 
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The subscript is removed when there is no ambiguity, since the operator needed 
for a particular operation is indicated from the type of the operand (e.g., дз when 
д applies to tetrahedra). The notion of boundary can be extended to k-chains by 
linearity, 9c — IÈ елим) cs S) = 2 eA UM) cs OS. 

We say that а k-chain с is closed if дис = 0. Non-trivial closed k-chains are 
called k-cycles and constitute the subspace Z(M) = Кег(дк; Cx (.Vf)). A k-chain 
c is a boundary if it exists а (К + 1)-сһаіп y such that c = дру. The k- 
boundaries constitute the subspace B(M) = Ok 1C 1 (М). From the property 
99 = 0, we know that boundaries are cycles but not all cycles are boundaries, 
and we have B(M) C Zx(M). The quotient space ?£ (M) = [Z(M)/Bk(M)] 
is the homology spaces of order k of the mesh M, and the Betti's number bg = 
rank [Жк (М0) ]. The presence of curl-free fields (resp. div-free fields) that are not the 
gradient of a scalar field (resp. the curl of a vector field) is indicated from the fact 
that bı Æ 0 (resp. b» 5 0). We recall that Betti’s numbers are topological invariants 
(1.е., they depend on the domain £2 up to a homeomorphism) and do not depend 
on the mesh M on £2 that is used to compute them (see [12] and an application in 
[11]. 

For the high order case, we need to introduce some concepts of relative 
homology. Let К, (.M) be subspaces of C; (MD) with &&K( M) C Ki 4 (M). We 
thus say that c € С (MD) is closed [modulo К (Л) if ðc € Ka CV). А (k— 1)-chain 
c bounds [modulo Ку (М) ] if there exists a k-chain y such that c —ду € Ka 4 (M). 
We thus talk about relative homology groups. 

A k-cochain w (over the mesh M) is a linear mapping from С (.M) to IR. They 
are discrete analogues to differential forms. For k 0, the exterior derivative of the 
(k — 1)-form w is the k-form dw such that f. dw = f}, ш for all s є Cj CM. With 
this simple equation relating the evaluation of dw on a simplex s to the evaluation 
of w on the boundary of this simplex, the exterior derivative is readily defined. We 
can naturally extend the notion of evaluation of a differential form w on an arbitrary 
chain by linearity: Sy, cs; W = Li Ј, ш. Thus 


[ dw = f w= | "= Уо f w. 
Уез (D>; cisi) Li дз; i 95: 


The operator d is the dual of the boundary operator д. As а corollary of the boundary 
operator property 00 = 0, we have that dd = 0. Since we used arrays of dimension 
| Ак (ЛИ) | to represent a k-cochain, the operator d can be represented by а matrix 
d of dimension | AcC.MD| х |Ax—-1(M)|, 1 < k < 3. Again, we have one matrix 
for the exterior derivative operator for each simplex dimension. When a metric is 
introduced on the ambient affine space, the exterior derivative operator d stands for 
grad, curl, div, according to the value of k from 1 to 3, and it is represented by, 
respectively, С, В, D, the connectivity matrices of the mesh M. 
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3 Small Simplices, Weights and Potentials 


We introduce the multi-index œ = (оо, ..., о) of s + 1 integers a; > 0 and weight 
|a| = У 7 ej. The set of multi-indices æ with s + 1 components and weight r is 
denoted Z (s + 1, г). We denote by у; the (Cartesian) coordinates of the node n; in 
IR?. Given a multi-index = € Z(4, г), and a k-subsimplex 5 of T, the small simplex 
{æ, S} is the k-simplex that belongs to the small tetrahedron with barycenter at the 
point of coordinates Yol + oi) Уо + 1), which is parallel and 1/(r + 1)- 
homothetic to the (big) sub-simplex S of Т. The notation (v, 5} was first defined in 
[9]. The set of small tetrahedra of order r --1 > 1 can be visualized starting from the 
principal lattice L,.,.; (T) in the simplex T = {ло (0) "^59 (1) 15002) n9 (3) defined as 


(x) € (0 1 2 r 
um r+ r+ rl 


Lame [nera ТЕЕ, 


cQ 
and connecting its points by edges parallel to those of Т. (See, e.g., Fig. 1.) 

We denote by A^(€2) the space of all smooth differential k-forms on 2. 
The completion of A*(2) in the corresponding norm defines the Hilbert space 
L? Ak (2). Let P Ат) be the space of so-called trimmed polynomial k-forms 


r+1 
of degree г + 1 on T, with r > 0, (as in [7]), and we define 


Р ЛМ) = {w € НАЦО) : wr € P7, ЛТ), ТЕМ} 


г+1 г+1 
where НЛ (0) = {w є A*(Q2) : do € A*(Q)} is a Hilbert space (see [4]). 


Definition 1 The weights of a polynomial k-form и € Р, , | A(T), with 0 <k <3 
andr > 0, are the scalar quantities 


f i (1) 
(o, S} 


on the small simplices {0, 5) witha € Z(4, г) and S € A(T). 


Fig. 1 From the principal lattice of degree r-- 1 = 3 ina tetrahedron Т, we define a decomposition 
of T into 10 small tetrahedra, 4 octahedra О and 1 reversed tetrahedron. Each face on dT is 
decomposed into 6 small faces and 3 reversed triangles, in solid red line (Left) 
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We now list some remarkable properties of the small simplices which are useful 
in the tree construction. 


Property 1 The weights (1) of a Whitney k-form и € Pad on all the small 
simplex (o, 5} of T are unisolvent, as stated in [7, Proposition 3.14]. The small 
simplices can thus support the degrees of freedom for fields и € Pad (T), with 
0 < k < 3andr > 0. Since the result on unisolvence holds true also by replacing 
T with ЕЕ A, 4(T) then Тгри € P \4* (F) is uniquely determined by the 
weights on small simplices in F. It thus follows that a locally defined u, with ит € 
P. OA) and single-valued weights, is in Н А* (2). We thus can use the weights 
on the small simplices {0, 5} as degrees of freedom for the fields in the finite element 
space P aA (M) being aware that their number is greater than the dimension of 


the space. 


Property 2 The weights given in Definition 1 have a meaning as cochains and this 
relates directly the matrix describing the exterior derivative with the matrix of the 
boundary operator. The key point is the Stokes’ theorem f, c du = J әс и, where 
и is а (k — 1)-form and C a k-chain. More precisely, if u € РАМ) then 
z = du € Рс , A*t! (M) and 


z =Í du =f u = Besen | u 
I, (a, S) ala, S) >. {В.Е} 


(В.Е) 


being B ће boundary matrix with as many rows as small simplices of dimension k 
and as many columns as small simplices of dimension k — 1. The small simplices 
{a, S} inherit the orientation of the simplex 5 so the coefficient Bie, 5}, (6, F} is equal 
to the coefficient Bs к of the boundary of Фе simplex S if В = œ. This is 
straightforward if dim( F) > 0 and when dim(F’) = 0, providing that small nodes in 
T are given in the notation (0, п} according to their position in the small simplices 
when fragmented (see Fig. 1 in [1]). 

+2 

2 


Property 3 Тһе generated ( 5^) small faces on each face F of T, pave F together 


with the (^ 2 reversed triangles, denoted by V, contained in F. Similarly, the 
r+3 "05 
3 3 


generated ( 3^) small tetrahedra contained in T pave T together with the ( 


octahedra, denoted by О, and the (7 15 reversed tetrahedra, denoted by L, contained 
in Т, as shown in Fig. 1. Reversed octahedra and reversed tetrahedra are examples 
of “holes” in T (see [9, 10]). 


Property 4 Since homology is preserved by homotopy, in [10, Section 3.4], it is 
discussed the fact that the relative homology (1.е., the homology [modulo the holes’ 
boundaries]), of the complex of small simplices is the same of the homology of 
Л. This property is fundamental to build the tree for high order potentials when 
working with small simplices. The homology [modulo the holes' boundaries] can be 
translated in matrix notation, by showing that the boundary matrices associated with 
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the small simplices, “modified” and “completed” (in a sense that we explain in the 
next section) by the relations [10, Proposition 3.5] are incidence matrices of a graph. 
To apply the theory presented in [10, Section 3.4] in a tetrahedron T € A3(M), we 
need to introduce, for r > 0, two sets Ху and K2 of chains generated by the small 
simplices that belong to the boundary of some hole in T as follows: 


• K; are the chains generated by the boundary of the (7 23 reversed triangle V C F 


and that for each F € A»(T), and the boundary of the three faces out of four on 
the boundary д 1 of each of the (5) reversed tetrahedra | in T; 

• Кә are the chains generated by 4 out of 8 faces of the (" 29 octahedra O in T. 
The involved faces аге the small faces belonging to the boundary д0 privated of 
9O n (A2(T) U 0.1). 


The two sets K and Кә satisfy the property 90K» С Кл, decisive to conclude that 
the relative homology [modulo the holes' boundaries] of the complex of the small 
simplices is the same as the homology of the original mesh M [10]. 


4 Trees and Graphs 


As stated in [12], a directed graph G consists of two sets V and A of nodes and 
arcs, respectively, subjected to certain incidence relations, collected in the all-vertex 
incidence matrix MV є ZWIXlAl as follows: 


—1, ifa starts from n, 
МО = { +1, ifa ends inn, 
0, ifa does not contain n. 


An incidence matrix M of the graph С is апу sub-matrix of M with || — 1 rows 
and |A| columns. The node that corresponds to the row of MY that is not in M will 
be indicated as the reference node of G. A graph © is connected if there is a path 
between any two of its nodes. A tree 7 of a graph © is a connected acyclic subgraph 
of G. A spanning tree 7; is a tree of G visiting all its nodes. Any connected graph 
G admits a spanning tree 7,. We have now to particularize these notions for small 
simplices. In each tetrahedron T of the oriented mesh M, we consider the small 
mesh associated with L,+ı(T) composed only of small tetrahedra, for a given r 
uniform all over the mesh M. The union of the small meshes for all T € A3(M) is 
denoted Ма. 


A (Primal) Small Tree for the Gradient Problem 

For = 0, the graph G! has N = Ao(M) and А = Д (М). The boundary matrix 
Gl is the all-vertex incidence matrix of the graph G!. Extracting a spanning 1-tree 
y from Gl is equivalent to finding in СТ, minus one row, a submatrix of maximal 
rank (see [11] for a suitable and easy way of constructing 7). For r > 0, we have 
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A 


V. 


P 


M 
V3 
M 
У 
А N LS | 
М. М 
V3 
У 
Fig. 2 (Left) The graph ©! and a spanning tree in thick line, for r = 1. (Right) A spanning tree 
for r = 2 ша fragmented layout 


to consider the new graph G! with Л = Aop(May) and A = Д! (Мап). Let Gl, be 
the all-vertex incidence matrix of this new graph G!. Note that GI т results from the 
boundary operator д on the elementary 1-chains from Ma. Extracting a spanning 
1-tree Tq from G! is equivalent to finding in Gl minus one row, a submatrix of 
maximal rank. Example of spanning 1-tree Т. forr+1 = 2 in the right part of Fig. 2 
and for r + 1 = 3 in Fig.5 (fragmented visualization). Note that we can repeat this 
construction in the two-dimensional case. 


A (Dual) Small Tree for the Divergence Problem 

For r — 0, the graph G? is built on M*, the so-called dual mesh of M, as follows. 
Let us note that an internal face F € A2(M) connects two adjacent tetrahedra 
Ti, T2 Е A3(M) whereas a boundary face Fp € A2(M) connects a tetrahedron 
Tp € A3(M) and the boundary 942. We can construct the following connected 
(dual) graph G?: the set of nodes, V, contains the barycenter of any tetrahedron 
T Е A3(M) together with one additional exterior node representing 092; the set of 
arcs, A, contains any face F € A2(M). For r = 0, the matrix D associated with the 
boundary operator дз, acting on Сз (Mf), is an incidence matrix of the (dual) graph 
G*, with reference node the one corresponding to 02. Extracting a spanning tree 
TIS from G? is equivalent to finding in D a submatrix of maximal rank. 

For r > 0, let R2 be the set of small faces chosen as follows: one small face for 
each octahedron О contained іп K3 (see the right side of Fig. 3 for the dashed small 
face in R2 when r - 1 = 2). To construct the graph G? forr > 0 we need to consider 
MG п» the dual mesh associated to Mat, where nodes are the small tetrahedra and 
the arcs the small faces, apart from the ones in R2. To understand this, we can reason 
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Fig. 3 The (dual) graph 92 associated with the small mesh Mg); defined in а tetrahedron Т for 
т = 1: the black dots are the nodes, and curved lines the arcs (Left). The (dual) graph G? obtained 
from G2 by merging the nodes corresponding to barycenter of ty = ((1, 0, 0, 0), Т} and of О, thus 
eliminating the arc associated with the shaded small face f? (Right) 


as follows. For r > 0, we have one arc connecting two small tetrahedra, say to, to, 
when 


e either tẹ, to share the same small face f, i.e. dtp NOt. = f; 
* Or fo, f; have a small face on the boundary of the same octahedron О, i.e. fo = 
01 1 дО and f; = dt, (190 for the same octahedron О. 


See an example of graph G? for Mai; (here M = {T} in the left part of Fig. 3 
for r + 1 = 2, where the node associated with the octahedron О is not a node in 
the graph, but stands to indicate that the four small tetrahedra are connected one to 
the other by one arc because they all have one small face on 0O. Naming tg the 
small tetra with a vertex in vg, К = 0,3, and numbering first the 3 x 4 faces on 
tk ПОТ, called fÉ fori = 1, 2, 3, second those on 0 O (where FP, fe: fe, re are 
the small faces up, left, down, right of д O), we have 


to 111 =| 
f EE AE: -1 

Dimp = h 1 1 1 =1 
в bd -1 
O 1 1 1 1 


gg X qq a Sa x зз үз О-О Pg 


Since the octahedron О is not part of the small mesh Мау, we have to imagine 
that its node collapses with the node of one of its neighbouring small tetrahedron, 
say fo with a vertex in vo, and thus that the corresponding arc (i.e. the small face 
F? = 9% N д0, the dashed one in the right part of Fig. 3) is eliminated. From а 
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-* ү, 


Fig.4 Example of spanning tree in the (dual) graph G?, namely a selection of acyclic paths made 
of arcs, visiting all the nodes of д? (г = 1, Left and г = 2, Right) 


matrix point of view, D is obtained by adding the line “О” in Dimp to the line “to”, 
and eliminating fe , namely 


ю [1 1 1 11 1 
f 111 zl 

D= f 111 =i 
3 11 1 —1 


с 


(in bold font, the submatrix of maximal rank in D for Фе spanning tree qe illustrated 

in Fig. 4, left part for r + 1 = 2). To repeat this construction in the two-dimensional 
case, when Т is a triangle, we have to consider the mesh Ma of small triangles 
in T and the role of the core octahedra O is played by the reversed triangles V € 
T. The set R2 is replaced by 71, composed of one small edge for each reversed 
triangle V € Kj. In two dimensions we do not have reversed tetrahedra, therefore 
no reversed triangles V. 

The construction of the spanning tree in M, can be done by assembling that of 
the geometrical mesh M, namely a spanning tree for the Whitney forms of lower 
degree (blue lines in Fig.5 (Right)), together with local contributions, one from 
each element (green lines in Fig. 5 (Right)). Each local contribution results from one 
fixed on a reference element which is mapped on the current element (respecting the 
orientation). In Fig. 5 (Left), in green/red thick line we have marked the small edges 
of a spanning tree in the graph G!, for r = 3, in the reference triangle. The red ones 
belong to the spanning tree in the reference triangle, but they are in general omitted 
in the spanning tree of Меп, (indeed, they appear only if they are covered by the 
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1 


Fig. 5 (Left) In thick colored line, the small edges of the graph G!, for r = 3, that compose а 
spanning tree in a reference triangle. (Right) In thick blue line the contribution of the branches of 
a spanning tree in a (2D) toy mesh M reported on Mazı. In green, the contribution of the small 
branches mapped from the green ones in the reference triangle. It is not necessary to report the red 
ones since they are either covered by the blue ones or omitted. The co-tree is in black 


blu 
3D 


e tree). The small co-tree is in black. A similar construction can be repeated in 
(both for k = 1 and К = 2) and it reflects the decomposition given, for instance, 


in [13] (Sect. 5). 
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Non-conforming Elements in Nek5000: A 
Pressure Preconditioning and Parallel SS 
Performance 


A. Peplinski, N. Offermans, P. F. Fischer, and P. Schlatter 


1 Introduction 


One of the most important concerns when solving numerically partial differential 
equations is finding the optimal grid on which the solution will be computed. 
Unfortunately in most cases it is not an easy task that could be determined in 
advance without deep understanding of the studied problem. That is why self- 
adapting algorithms like e.g. adaptive mesh refinement (AMR) have received 
much attention in past decades and became an important part of many packages 
for numerical modelling of fluid dynamics e.g. [9, 18]. The goal of AMR is to 
control the computational error during the simulation by placing higher resolution 
grids where it is needed. This makes the numerical modelling more robust, and 
gives the possibility to increase the accuracy of numerical simulations at minimal 
computational cost. The drawback is, however, increased solver complexity, and it 
that can have negative effects on the parallel code performance, in particular related 
to load balancing. 

There are number of different AMR schemes, and in the context of the spectral 
element method (SEM) [16], in which the discretisation is based on a decomposition 
of the computational domain into a number of non-overlapping, high-order sub- 
domains called elements, we can distinguish three different categories: The mesh 
adaptation in this case can mean adjusting the (local) size of an element (r- 
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refinement), changing the polynomial order in a particular element (p-refinement), 
or splitting the element into smaller ones (й-гейпетепо. In this work we concentrate 
on an /-refinement framework and its implementation in Nek5000 [8], which is 
a highly parallel and efficient SEM solver for the incompressible Navier-Stokes 
equations. In its established version, Nek5000 only supports conformal elements at 
constant polynomial order throughout the domain. 

The present work was started within EU project CRESTA, where the non- 
conforming solver for advection-diffusion problem was developed and the basic 
AMR tasks were implemented using existing external libraries. As h-refinement 
affects the element connectivity resulting in non-conforming meshes, a special grid 
manager is required to perform local refinement/coarsening and to build globally 
consistent meshes. For this task the p4est library [1] has been chosen, as it is 
designed to manipulate domains composed of multiple, non-overlapping logical 
cubic sub-domains, which can be represented by a recursive tree structure. This 
library provides element connectivity information for the dual graph, which is later 
manipulated by ParMETIS [10] producing a new element-to-processor mapping. 
The final step of grid refinement/coarsening and redistribution is performed within 
the non-conforming version of Nek5000, which utilises the so-called conforming- 
space/nonconforming-mesh approach based on the previous work of Fischer et al. 
[7, 11]. As the solver complexity grows special care has been taken to develop 
efficient tools that can be used within AMR framework. A more detailed description 
of them and the related scaling tests can be found in [17]. 

The goal of ExaFLOW is to extend results of CRESTA to the full incompressible 
Navier-Stokes equations focusing on proper adaptation of the pressure precondi- 
tioners for nonconforming SEM. Defining a robust parallel preconditioning strategy 
has received much attention in past decades, as the linear sub-problem associated 
with the divergence-free constraint (pressure-Poisson equation) can become very ill- 
conditioned. In the context of SEM two possible approaches based on the additive 
overlapping Schwarz method [4, 6] and the hybrid Schwarz-multigrid method [5, 12] 
were proposed and implemented in Nek5000, leading to a significant reduction of 
pressure iterations. 

In the present paper, we discuss the modifications necessary to adapt Nek5000 for 
the h-type AMR framework. The article is organised as follows. A short description 
of SEM and pressure preconditioners is given in Sects.2 and 3. The following 
Sects. 4 and 5 describe the algorithmic modifications and parallel performance of 
the code. Finally, Sect. 6 provides conclusion and future work. 


2 SEM Discretisation of the Navier-Stokes Equations 


We review briefly the discretisation of the incompressible Navier-Stokes equations 
to introduce notation and point out algorithm parts that require modification. The 
more in-depth derivation can be found in e.g. [4]. The temporal discretisation is 
based on a semi-implicit scheme in which the nonlinear term is treated explicitly 
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and the remaining unsteady Stokes problem is solved implicitly. To avoid spurious 
pressure modes our spatial discretisation is based on the Py — Ру—2 SEM, 
where velocity and pressure spaces are spanned by Lagrangian interpolants on 
the Gauss—Lobatto—Legendre (GLL) and Gauss—Legendre (GL) quadrature points, 
respectively. Note that the basis for velocity is continuous across element interfaces, 
whereas the basis for pressure is not. Assuming f" incorporates all nonlinear and 
source terms treated explicitly at time t", the matrix form of the Stokes problem 
after applying the Uzawa decoupling reads: 


H -XHB'D'|(w| _ [ВР 07р"! (1) 
0 E Ap 8 


E = pp ip? (2) 
Во 


where 


is the Stokes Schur complement governing the pressure, Ap = p" — р"! is the 


pressure update, and g is the inhomogeneity arising from Gaussian elimination. 
In these equations Н = – А + fB and D are the discrete Helmholtz and 
divergence operators, respectively. Во, A and B denote here a coefficient from time 
derivative, a discrete Laplacian and a diagonal mass matrix associated with the 
velocity mesh. Applying the Uzawa decoupling we use the inverse mass matrix B^! 
as approximation of the inverse Helmholtz operator H^, giving rise to a splitting 
error. Note that for this splitting method the diagonality of the mass matrix B is 
crucial to avoid costly matrix inversion. 

АП operators Н, A, B and E are symmetric positive definite (SPD) and can be 
solved with a preconditioned conjugate gradient (PCG) method. Moreover, E has 
properties similar to a Poisson operator, and is often referred to as a consistent 
Poisson operator. The systems involving H and E are solved iteratively with E 
being more challenging, and in the next section we will present the preconditioning 
strategy for the pressure equation, 


BAS «pus (3) 


We close this section by shortly presenting the SEM operators. SEM introduces 
a globally unstructured and locally structured basis by tessellating the domain into 
К non-overlapping subdomains (deformed quadrilaterals), Q = |) "m 1 Әк, and 
representing functions in each subdomain in terms of tensor-product polynomials 
on a reference subdomain Q = [—1, 119. In this approach every function or operator 
15 represented by its local counterparts, which in case of functions takes the form of 
a sum over the subdomains 


K 
Faye Paw), 
k=1 i 
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Here, fi and h; аге the nodal values of the function in ©; and the base functions 
in Q, respectively, with i presente the natural ordering of nodes in ©. Com- 
bining the coefficients ru one can build global f and local fi representations 
of the function. Each global degree of freedom occurs only олсе in the global 
representation, but has multiple copies of faces, edges and vertices related to О, in 
the local one. To enforce function continuity, the global-to-local mapping is defined 
as the matrix-vector product f = Q f, where Q is a binary operator duplicating 


the basis coefficients in adjoining subdomains. The action ОТ Í j SUMS multiple 
contributions to the global degree of freedom from their local values. The assembled 
global stiffness matrix A takes the form 


(Vf, Vg) = f" Ag = ТОТА: Qg, 


where a block diagonal matrix Аг is the unassembled stiffness matrix with each 
diagonal block consisting of the local stiffness matrix A$, = f ghi ау ах In 
practise, the global stiffness matrix is never formed explicitly, and the ане 
operator QQ? is used instead. This operator contains all information about element 


connectivity. 


3 Pressure Preconditioner 


An efficient solution of Eq. (3) requires finding an SPD preconditioning matrix M7! 
which can be inexpensively applied and which reduces the condition number of 
МЕ. Preconditioners based on domain decomposition are a natural choice for 
SEM as the data is structured within an element but is otherwise unstructured. 

An overlapping additive Schwarz preconditioner for Eq. (3) was developed in 
[4] based on linear finite element discretisation of Poisson operator. It combines 
solutions of the local Poisson problems in overlapping subdomains RTA; ! Rx with 


the coarse grid problem R7 Ау ! Ro, which is solved on few degrees of freedom, but 
covers the entire domain 


M^ = ВА, 'Во+ У REAL "Ву. 
k 


For the local problems restriction and prolongation operators, Вх and RI, are 
Boolean matrices that transfer data to and from the subdomain, and А, is a local 
stiffness matrix which can be inverted with e.g. a fast diagonalisation method. Note 
that action of Rx and RI are similar to the gather-scatter operator аа”. 

The coarse grid problem corresponds to the Poisson problem solved on the 
element vertices only, with RI being the linear operator interpolating the coarse 


grid solution onto the tensor product array of GL points. Unlike in [4, 6], Âo is 
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defined using local SEM-based Neumann operators performing the projection of 
local stiffness matrices Ах evaluated on the GLL quadrature points onto the set 
of coarse base functions b; representing the linear finite element base on the GLL 
grid. The coarse base functions are defined in Q asa tensor-product of the one- 
dimensional linear functions. The local contribution to Áo is given by bT А;Ь ;, and 
the full Áo is finally assembled by local-to-global mapping summing contributions 
to the global degree of freedom from their local counterparts. Ао 15 one of few 
matrices formed explicitly in Nek5000. 

On the other hand, the hybrid Schwarz-multigrid preconditioner is based on the 
multiplicative Schwarz method, which for the two-level scheme takes the form, 


М! = RJ Âg 'Ro | У`ВХА; "В, |, 
k 


and leads to the following two-level multigrid scheme, 


G) u! = У АГА Pug. 
k 


(ii) г = g — Aul, 
(iii) е = АТА "Вог, 
(iv) и = ul +e, 


where g, r, e and u are right-hand side, residual, coarse-grid error and solution of 
equation Аи = g, respectively. This method can be extended to a general multilevel 
solver performing a full V cycle [5, 12]. Notice that by replacing step ii) with r — g 
we obtain the additive Schwarz preconditioner. 


4 Adaptation for Non-conforming Meshes 


The important advantage of SEM in the context of AMR is its spatial decomposition 
into elements that can easily be split into smaller ones, and use of the local repre- 
sentation of the operators which decouples intra- and inter-element operations. As 
h-type AMR using the conforming-space/nonconforming-mesh approach leaves the 
approximation spaces unchanged, most of the tensor-product operations evaluated 
element-by-element are preserved, limiting the changes in the algorithm. 

The inter-element operations are mostly performed by the gather—scatter operator 
QQ’ which has to be redefined to include spectral interpolation at the non- 
conforming faces. Following [7] we consider a non-conforming face shared by 
one low resolution element (parent) and two (in 3D four) high resolution elements 
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(children). We introduce a local parent-to-child interpolation operator J^? which is 
a spectral interpolation operator with entries 


(00) = 6), 


where с "i represents the mapping of GLL points from the child face to its parent. 
This operator is locally applied to give the desired nodal values on the child face, 
after Q copies data form the parent to the children. Building a block-diagonal 
matrix Jr, with local matrices J^" one can redefine scatter Ју О and gather-scatter 
J ее operators, respectively. For more discussion see Fig. 6 and Sect. 4 in [7]. 

The next crucial modification is diagonalisation of the global mass matrix 
ОТВ, О (Br is a block-diagonal built of local mass matrices), whose inverse is 
required in Eqs. (1) and (2). It is non-diagonal due to the fact that the quadrature 
points in the elements along the non-conforming faces do not coincide. A diagonal- 
isation procedure is given in [7] and consists of building the global vector b 


b:= Be = Q'JIBié,, 


and finally setting the lumped mass matrix В) = óijb;. è and ё; denote here the 
global and local vectors containing all ones. 

The additive Schwarz preconditioner requires two significant modifications. The 
first one is related to the assembly of the coarse grid operator Áo, which gets more 
complex for non-conforming meshes. This is due to the fact that the non-conforming 
mesh introduces hanging vertices located in the middle of faces or edges. These 
hanging vertices are not global degrees of freedom and cannot be included in Ао. 
To remove them from consideration one has to modify the set of local coarse base 
functions b;, which are thus dependent on the shape of the refined region as well as 
the position and orientation of the child face with respect to the parent one. Unlike 
the conforming case, where all Б; could be represented by a tensor product of two 
or three linear functions, the non-conforming mesh requires 5 basic components in 
two and 21 in three dimensions to assemble all the possible shapes of bj. 

The last missing components are the restriction and prolongation operators, 
Ry and RT, for the local Poisson problem. Taking into account the similarity 


between these operators with QQ’ and following the previous development we 
use an operator similar to J 100727, replacing Jz with the interpolation operator 
defined on the GL quadrature points. Although this choice seems to be optimal as 
it preserves properties of the preconditioner and JT is well defined, our numerical 
experiments showed a significant increase of pressure iterations in some cases. It 
was found to be caused by the noise introduced by Т in the Schwarz operator. 
To reduce this noise we replaced the transposed interpolation operator with the 
inverse one, getting a significant reduction of iterations. Unfortunately, such a 
preconditioner is no longer SPD and PCG cannot be used as an iterative solver 
in this case. The other problem is the definition of dz. as J“? can be inverted for 
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square matrices only, thus excluding p-refinement strategies. To avoid this problem 
we define a child-to-parent interpolation operator J?° with the entries 


Ager if cP Е AQP NAN 
0 otherwise 


(0%), = 


, 


where 8€2? and 02° are the parent and child common faces, g? is a parent GLL 


point at the face dQ?, and $, РС represents the mapping of $; ? to the child face 99°. 
This operator is locally applied to give the desired nodal values on the child face, 
before ОТ sums data form the children and the parent. Building block-diagonal 
matrix Jr consisting of local matrices Ј? one can redefine the gather-scatter 
J „ОО! operator such that it is appropriate for the pressure preconditioner. 

In a similar way we modify the multiplicative Schwarz method, as it shares 
a number of features with the additive one. In this case we distinguish between 
Schwarz (acting at single level) and restriction (connecting different levels) opera- 
tors and apply J QQ" Jj! and J QQ’ Jt to each of them, respectively. Unlike the 
additive preconditioner, the hybrid one requires also the redefinition of the diagonal 
weight matrix that indicates the number of sub-domains sharing a given node, and is 
used to accommodate for overlapping regions. Its value is important as it reduces the 
largest eigenvalue of the MA operator and defines the smoothing properties of the 
additive Schwarz step (see [4] and the references therein). In the conforming case 
its definition is straightforward, however the non-conforming case is more involved 
as hanging nodes are not real degrees of freedom. In the current implementation 
the information about node multiplicity on the non-conforming faces is hidden to 
the parent element, so the parent element sees only one neighbour instead of two 
(four in 3D). Although this choice gives a preconditioner that significantly reduces 
the number of pressure iterations, its performance for the studied cases is slightly 
worse than the performance of the additive Schwarz preconditioner. This can be 
caused by a non-optimal value of the weight matrix, or by the fact that the hybrid 
preconditioner is superior over the additive one for high-aspect ratio elements (that 
are not present in our adaptive simulations). 


5 Parallel Performance 


The parallel performance test is based on the one of the ExaFLOW flagship 
calculations, and consists of the turbulent flow around a NACA4412 wing section 
with 5° angle of attack, at a Reynolds number based on inflow velocity Us; and 
chord length c of Ree = 200,000. It was previously studied in a series of well- 
resolved large-eddy simulations conducted with the conforming №5000 version, 
and discussed in detail in [19]. This flow configuration was chosen to illustrate the 
significant benefit of using AMR, in particular when it comes to the farfield region 
in the computational domain, but for this article we will only briefly discuss the 
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solver time —1— 
linear scaling 
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16 32 64 128 256 
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Fig. 1 (a) Volume visualisation of that part of the domain covered by refinement levels higher than 
one for the turbulent flow around a wing profile. The wing vicinity and wake region are resolved 
and a colour indicates different refinement levels. (b) Strong scaling of the non-conforming 
Nek5000 solver for the same case performed on Beskow. The plot shows the time per time step 
as a function of node number. Each node consists of 32 cores 


strong scaling results. We omit here a weak scaling test, as Nek5000 uses iterative 
solvers and with the current example we cannot provide meaningful data. 

The initial coarse and conforming mesh consisted of 2190 elements with 
polynomial order № = 7 and was evolved for 7.2 time units с/ Ооо to evolve 
the refinement process using spectral error indicators [13, 14], and allowing for 
6 refinements levels. The resulting non-conforming grid was built of 224,272 
elements with 76.37 x 10° degrees of freedom, resolving the wing surface and 
the wake, Fig. la. This final mesh was used to test the parallel performance of 
the non-conforming solver using the petascale Cray XC40 system Beskow at PDC 
(Stockholm). This system consists of 2060 nodes with 32 cores per node and 2.438 
PFlops peak performance. We compare our results with the scaling tests of the 
conforming Nek5000 presented in Offermans et al. [15]. The most relevant test in 
this article is pipe flow at Re; = 360 (upper-right plot in their Fig. 5), as it is similar 
in size with the discussed wing case. We should mention here that our goal is not 
to improve the parallel performance of the conforming code, but rather to retain 
it despite of a work imbalance introduced by an additional operator in the direct 
stiffness summation of the non-conforming solver. 

To be able to compare to the conforming solver, we focus on the time evolution 
loop only, excluding code initialisation, finalisation, mesh rebuilding within AMR 
and I/O operations. The result of the strong scaling test is presented in Fig. 1b 
showing the time per time step as a function of node count. This plot is almost 
identical with the reference one in [15]. Both show slight super-linear scaling 
between 32 and 256 nodes despite growing work imbalance for the non-conforming 
solver. We also reach the strong scaling limit at around 256 nodes, which for the 
conforming solver on Beskow was estimated to be between 30,000 and 50,000 
degrees of freedom per core [15]. This shows that the parallel performance of 
the non-conforming and conforming solvers is almost the same and proves the 
efficiency of our implementation. 


Non-conforming Elements in Nek5000 607 


The maximum number of the compute nodes used in the test was not set by 
the parallel properties of the non-conforming Nek5000, but by the quality of the 
domain partitioning provided by ParMETIS. Within ExaFLOW we developed a new 
grid partitioning scheme for Nek5000 (not discussed in this paper) that takes into 
account a core distribution among the nodes, and consists of two steps: inter- and 
intra-node partitioning. Although this two-level partitioning scheme significantly 
improves the efficiency of a coarse grid operations for XXT, especially during the 
setup phase, it relies on the quality of an inter-node partitioning. If the first step gives 
subdomains with disjoint graphs, the second step cannot be performed. We found 
that the probability of getting disjoint graphs increases with decreasing number of 
elements per node, virtually prohibiting the runs with less than 1000 elements per 
node. However, this limit can differ between simulations. We note however that in 
the standard production use of the solver this limitation is not critical, as according 
to [15] it is usually close to the strong scaling limit of conforming Nek5000. 


6 Conclusions 


Within the ExaFLOW project we developed a fully functional SEM-based h-type 
adaptive mesh refinement (AMR) solver for the incompressible Navier-Stokes 
equations. This allows for much larger flow cases to be run at reduced cost, as the 
high resolution grid is placed only in those region where it is needed. At the same 
time the simulation quality is improved, as the computational error can be controlled 
during the run. 

We have optimised for non-conforming meshes the pressure preconditioners 
based on the additive overlapping Schwarz and hybrid Schwarz-multigrid methods. 
To achieve this we modified the base functions for the assembly of a coarse-grid 
operator to remove hanging nodes, and redefined the direct stiffness summation 
operator to include spectral interpolation at the non-conforming faces and edges. 
We introduced two operators Jz Qo JT and J 109727! for the different steps in 
the pressure calculation. The last crucial modification was the diagonalisation of the 
global mass matrix. 

Using real flow cases we show our AMR implementation to be correct and effi- 
cient. Ап important success is the fact that parallel performance of the conforming 
and non-conforming solvers is very similar, despite the increased complexity of the 
non-conforming one. 

In the future we are going to investigate other definitions of the weight matrix for 
the hybrid Schwarz-multigrid method, and to test different pressure preconditioners 
based on the restricted additive Schwarz method [2, 3]. We are going as well to work 
on the quality of the graph partition, as the two-level partitioning would not accept 
disjoint graphs on the node's subdomain. 
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Sparse Approximation of Multivariate A 
Functions from Small Datasets Via PEE 
Weighted Orthogonal Matching Pursuit 


Ben Adcock and Simone Brugiapaglia 


1 Introduction 


In recent years, a new class of approximation strategies based on compressive 
sensing (CS) has been shown to be able to substantially lessen the curse of 
dimensionality in the context of approximation of multivariate functions from 
pointwise data, with applications to the uncertainty quantification of partial differ- 
ential equations with random inputs. Based on random sampling from orthogonal 
polynomial systems and on weighted £! minimization, these techniques are able 
to accurately recover a sparse approximation to a function of interest from a small- 
sized datasets of pointwise samples. In this paper, we show the potential of weighted 
greedy techniques as an alternative to convex minimization programs based on 
weighted £! minimization in this context. 

The contribution of this paper is twofold. First, we propose a weighted orthog- 
onal matching pursuit (WOMP) algorithm based on a rigorous derivation of the 
corresponding greedy index selection strategy. Second, we numerically show that 
WOMP is a promising alternative to convex recovery programs based on weighted 
£! minimization, thanks to its ability to compute sparse approximations with an 
accuracy comparable to those computed via weighted £! minimization, but with a 
considerably lower computational cost when the target sparsity level (and, hence, the 
number of WOMP iterations) is small enough. It is also worth observing here that 
WOMP computes approximations that are exactly sparse, as opposed to approaches 
based on weighted £! minimization, which provide compressible approximations in 
general. 
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Brief Literature Review Various approaches for multivariate function approxi- 
mation based on CS with applications to uncertainty quantification can be found 
in [1, 3-6, 11-13, 17]. An overview of greedy methods for sparse recovery in CS 
and, in particular of OMP, can be found in [7, Chapter 3.2]. For a general review 
on greedy algorithms, we refer the reader to [15] and references therein. Some 
numerical experiments on a weighted variant of OMP have been performed in the 
context of CS methods for uncertainty quantification in [4]. Weighted variants of 
OMP have also been considered in [10, 16], but the weighted procedure is tailored 
for specific signal processing applications and the term “weighted” does not refer 
to the weighted sparsity setting of [14] employed here. To the authors’ knowledge, 
the weighted variant of OMP considered in this paper seems to have been proposed 
here for the first time. 


Organization of the Paper In Sect. 2 we describe the setting of sparse multivariate 
function approximation in orthonormal systems via random sampling and weighted 
£! minimization. Then, in Sect.3 we formally derive a strategy for the greedy 
selection in the weighted sparsity setting and present the WOMP algorithm. Finally, 
we numerically show the effectiveness of the proposed technique in Sect. 4 and give 
our conclusions in Sect. 5. 


2 Sparse Multivariate Function Approximation 


We start by briefly introducing the framework of sparse multivariate function 
approximation from pointwise samples and refer the reader to [3] for further details. 
Our aim is to approximate a function defined over a high-dimensional domain 


f:D—C, with D=(-1,1)%, 
where d > 1, from a dataset of pointwise samples f(ti),..., f(tn). Let v be a 


probability measure on D and let {ф;} jend be an orthonormal basis for the Hilbert 


space 12 (р). In this paper, we will consider {Фу}, end to be a tensorized family of 


Legendre or Chebyshev orthogonal polynomials, with v being the uniform or the 
Chebyshev measure on О, respectively. Assuming that f Е 12 (D) n L** (D), we 
consider the series expansion 


f= у, хуфу. 
jeNd 


Then, we choose a finite set of multi-indices A С Nd with |A| = N and obtain the 
truncated series expansion 


ЛА = У`хуфу. 


je^ 


Sparse Approximation of Multivariate Functions from Small Datasets Мла... 613 


In practice, a convenient choice for A is the hyperbolic cross of order s, i.e. 
d 
А= }јем:[ [+0 5р, 


К=1 


due to the moderate growth of N with respect to d. Now, assuming we collect т < 
N pointwise samples independently distributed according to v, namely, 


А 11.4. 
Р), ..., РО), with 11,...,1т e V, 


the approximation problem can be recasted as a linear system 
Ахл = у+е, (1) 


with ХА = (xj)jeA € СМ, and where the sensing matrix А є C"* and the 
measurement vector y € C" are defined as 


1 1 
Aij := Jm eh yi := Jaf. vi € [m], Yj € [N], (2) 
with [k] := {1,..., k} for every k € N. The vector e € C" accounts for the 


truncation error introduced by A and satisfies |lello < п, where у > 0 is ana 
priori upper bound to the truncation L^? (D)-error, namely || f — fa 12р) < n. А 
sparse approximation to the vector can be then computed by means of weighted £! 
minimization. 

Given weights w € R^ with w > 0 (where the inequality is read compo- 
nentwise), recall that the weighted £! norm of a vector z € C" is defined as 
lzli := Уем] [z;|w;j. We can compute an approximation Хл to x, by solving 
the weighted quadratically-constrained basis pursuit (WQCBP) program 


Хл € arg min |zli,w», st. lAz—»lo <n, (3) 
ЕСМ 


where the weights w € IR are defined as 


wj = |léjllroe(py- (4) 


The effectiveness of this particular choice of w is supported by theoretical results 
and it has been validated from the numerical viewpoint (see [1, 3]). The resulting 
approximation f, to f is finally defined as 


dise os 


je^ 
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In this setting, stable and robust recovery guarantees in high probability can be 
shown for the approximation errors || f — falez) and || f — fallLæ(p) under а 
sufficient condition on the number of samples of the form m > s" - polylog(s, d), 
with y = 2 or y = log(3)/log(2) for tensorized Legendre or Chebyshev poly- 
nomials, respectively, hence lessening the curse of dimensionality to a substantial 
extent (see [3] and references therein). We also note in passing that decoders such 
as the weighted LASSO or the weighted square-root LASSO can be considered as 
alternatives to (3) for weighted £! minimization (see [2]). 


3 Weighted Orthogonal Matching Pursuit 


In this paper, we consider greedy sparse recovery strategies to find sparse approxi- 
mate solutions to (1), as alternatives to the WQCBP optimization program (3). With 
this aim, we propose a variation of the OMP algorithm to the weighted setting. 

Before introducing weighted OMP (WOMP) in Algorithm 1, let us recall 
the rationale behind the greedy index selection rule of OMP (corresponding to 
Algorithm 1 with à = 0 and w = 1). For a detailed introduction to OMP, we 
refer the reader to [7, Section 3.2]. Given a support set $ C [N], OMP solves the 
least-squares problem 


min Go(z) s.t. supp(z) С S, 
zeCN 


where Go(z) := |ly — Аг|5. In ОМР, the support S is iteratively enlarged by опе 
index at the time. Namely, we consider the update S U {j}, where the index j є [N] 
is selected in a greedy fashion. In particular, assuming that A has (?-normalized 
columns, it is possible to show that (see [7, Lemma 3.3]) 


min Со(х + c Соо) = \(A*(y = Ах)) |. (5) 


This leads to the greedy index selection rule operated Бу OMP, which prescribes 
the selection of an index j € [N] that maximizes the quantity |(A*(y — Ах)) ;|?. 
We will use this simple intuition to extend OMP to the weighted case by replacing 
the function Go with a suitable function G, that takes into account the data-fidelity 
term and the weighted sparsity prior at the same time. 

Let us recall that, given a set of weights ш € R with w > 0, the weighted £? 
norm of a vector z € С^ is defined as the quantity (see [14]! 


lzlow:- У, м7. 


jesupp(z) 


l The term “norm” here is an abuse of language, but we will stick to it due to its popularity. 
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Notice that when w = 1, then ||- |lo,w = ||: о is the standard 60 norm. Given A > 0, 
we define the function 


Gi(z) = lly — А215 + Ао. (6) 


The tradeoff between the data-fidelity constraint and the weighted sparsity prior 
is balanced via the choice of the regularization parameter A. Applying the same 
rationale employed in OMP for the greedy index selection and replacing Go with 
С), leads to Algorithm 1, which corresponds to OMP when A = 0 and w = 1. 


Algorithm 1 Weighted orthogonal matching pursuit (WOMP) 
Inputs: 


* A € C”*N: sampling matrix, with 2-normalized columns; 
e y € C”: vector of samples; 

• w € RN: weights; 

e À > 0: regularization parameter; 

* КЕМ: number of iterations. 

Procedure: 


1. Let Хо = 0 and Sp = Ø; 
2. Fork = 1,..., К: 


а. Find ję є arg max Ay Qxk—1, Sk-1, J), with Ал as in (7); 
je 


b. Define Sk = 51 U {jk}; 
c. Compute x, € arg min | Ао — yll2 s.t. supp(v) € Sx. 
veC 


Output: 


e £k € C": approximate solution to Az = y. 


Remark 1 The €?-normalization of the columns of А is a necessary condition to 
apply Algorithm 1. If A does not satisfy this урашы, is suffices to apply WOMP 
to the normalized system Az = у, Where А = АМ! and M is the matrix containing 
the £? norms of the columns of A on the main diagonal and zeroes elsewhere. The 
approximate solution Хк to Az = у computed via WOMP is then rescaled as Мхк, 
which approximately solves Az — y. 


The following proposition justifies the weighted variant of OMP considered 
in Algorithm 1. In order to minimize G; as much as possible, at each iteration, 
WOMP selects the index j that maximizes the quantity A; (х, S, j) defined in (7). 
The following proposition makes the role of the quantity A; (x, S, j) transparent, 
generalizing relation (5) to the weighted case, under suitable conditions on A and x 
that are verified at each iteration of Algorithm 1. 
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Proposition 1 Let 4 > 0, SC [М], A € Сх with -normalized columns, and 
x € CP satisfying 


х € arg min [у = 42112 s.t. supp(z) € S. 
zeC 


Then, for every j Е [М], the following holds: 


min expe + tej) = Gi(x) — Ay, S, j), 
t€ 
where G, is defined as in (6), ^; : CN x 2171 x [N] > Ris defined by 


max {|(A*(y — Ах) 12 — дий, О} irj es 
A(x, 5, j) = į max Аш? — |х, o] if je Sandx; #0 


0 if j € S and xj = 0. 
(7) 


Proof Throughout the proof, we will denote the residual as r := y — Ax. 
Let us first assume j ¢ S. In this case, we compute 


G4(x + tej) = lly — A(x + ге) + Allx + ге; о 
= 115 + 1012 — 2ReG(A*r) j) + А — 8,0) ш? +All low. 
— — ———— 


=:h(t) 
where à. is the Kronecker delta function. In particular, we have 


At) = ift =0 
2 2 Re(f(A*r);) + hw; ift € C \ {0}. 


Now, if (A*r); = 0, then h(t) is minimized for т = 0 and min;ec G(x + tej) = 
G(x). On the other hand, if (A*r); Æ 0, by arguing similarly to [7, Lemma 3.3], we 
see that 


min h(t) = —|(A*r) 2 + Аш?, 
1ЕС\{0} ` ч 


where the minimum is realized for some t € C with |t| = |(A*r);| = 0. In 
summary, 


min h(t) = min {—|(A*r))? + dw, o] = — max [i4*;? —Aw?, o] : 
te | 4 | 


which concludes the case j ¢ S. 


Sparse Approximation of Multivariate Functions from Small Datasets Via. . . 617 


Now, assume j € S. Since the vector ху = x|s Е C!S! is a least-squares solution 
to Asz = y, it satisfies Ах(у — Asxs) = 0 and, in particular, (A*r); = 0. (Here, 
As Е C”XISI denotes the submatrix of A corresponding to the columns in 5). 
Therefore, arguing similarly as before, we have 


G(x + tej) = 11715 + It? + AC — хи +Allx — хуем. 
————— 
=:L(t) 
Considering only the terms depending on f, it is not difficult to see that 
min £(t) = min(|x;|?, Aw]. 
teC | J 
As a consequence, for every j € S, we obtain 
min G(x + геу) = |715 + Alle — x jej По, + паху, Аюу) 


= G(x) + min(|x;^, Аш} — ACL — ôx; о)шў. 


The results above combined with simple algebraic manipulations lead to the desired 
result. 0 


4 Numerical Results 


In this section, we show the effectiveness of WOMP (Algorithm 1) in the sparse 
multivariate function approximation setting described in Sect. 2. In particular, we 
choose the weights w as in (4). We consider the function 


d 
f(t) = In а+1+ Уа , withd = 10. (8) 
k=1 


We let {ф;} јем be the Legendre and Chebyshev bases and v be the respective 


orthogonality measure. In Figs. 1 and 2 we show the relative L2(D)-error of the 
approximate solution Хк computed via WOMP as a function of iteration К, for 
different values of the regularization parameter А in order to solve the linear system 
Az = y, where A and y are defined by (2) and where the £?-normalization of the 
columns of A is taken into account according to Remark 1. We consider A = 0 
(corresponding to OMP) and A = 107*, with k = 3, 3.5, 4, 4.5, 5. Here, ^ is the 
hyperbolic cross of order s — 10, corresponding to N — |A| — 571. Moreover, we 
consider m — 60 and m — 80. The results are averaged over 25 runs and the 12 (D)- 
error is computed with respect to a reference solution approximated via least squares 
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Legendre basis, d — 10, s — 10, m — 60 Legendre basis, d — 10, s — 10, m — 80 
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Fig. 1 Plot of the mean relative L? (D)-error as a function of the number of iterations K of WOMP 
(Algorithm 1) for different values of the regularization parameter A for the approximation of the 
function f defined in (8) and using Legendre polynomials. The accuracy of WOMP is compared 
with those of QCBP and WQCBP 
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Fig. 2 The same experiment as in Fig. 1, with Chebyshev polynomials 


and using 20N — 11,420 random i.i.d. samples according to v. We compare the 
WOMP accuracy with the accuracy obtained via the QCBP program (3) with у = 0 
and WQCBP with tolerance parameter n = 10-8. To solve these two programs we 
use CVX Version 1.2, a package for specifying and solving convex programs [8, 9]. 
In CVX, we use the solver ‘mosek’ and we set CVX precision to ‘high’. 
Figures 1 and 2 show the benefits of using weights as compared to the unweighted 
OMP approach, when the parameter А. is tuned appropriately. A good choice of A for 
the setting considered here seem to be between 107^? and 10—35. We also observe 
that WOMP is able to reach similar level of accuracy as WQCBP. An interesting 
feature of WOMP with respect to OMP is its better stability. We observe than after 
the m-th iteration, the OMP accuracy starts getting substantially worse. This can be 
explained by the fact that when K approaches N, OMP tends to destroy sparsity by 
fitting the data too much. This phenomenon is not observed in WOMP, thanks to its 
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Legendre basis, d — 10, s — 10, m — 60 Chebyshev basis, d — 10, s — 10, m — 60 
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Fig. 3 Plot of the support size of Хк as a function of the number of iterations К for WOMP 
in the same setting as in Figs. 1 and 2, with Legendre (left) and Chebyshev (right) polynomials. 
The larger the regularization parameter A, the sparser solution (in the left plot, the curves relative 
toà = 10545 and à = 107^ overlap. In the right plot, the same happens for А = 1074 and 
à = 1077?) 


Table 1 Comparison of the computing times for WQCBP and K = 25 iterations of WOMP 


| WOMP with A as below 

Basis т | ОСВР |WQCBP|OMP |105  [|10-45 |10-^ [10—35 | 10-3 
Legendre |60|1.9e—01 | 2.0e—01 | 1.6е—02 | 13е—02 | 1.2е—02 | 1.3e-02 | 1.2е—02 | 12e—02 
Legendre |80|2.1e—01 | 2.1e—01 | 1.7e—02 | 1.5е—02 | 1.3е—02 | 1.4е—02 | 14e—02 | 1.3е—02. 
Chebyshev | 60 | 1.9e—01 | 1.9е—01 | 1.5е—02 | 1.3e—02 | 1.2e—02| 1.2е—02 | 12e—02 | 1.2е—02 
Chebyshev | 80 | 2.1e—01 | 2.1e—01 | 1.7е—02 | 1.5e—02 | 1.3e—02 | 14e—02 | 1.4e—02 | 14e—02 


ability to keep the support of Хх small via the explicit enforcement of the weighted 
sparsity prior (see Fig. 3). 

We show the better computational efficiency of WOMP with respect to the 
convex minimization programs QCBP and WQCBP solved via CVX by tracking 
the runtimes for the different approaches. In Table 1 we show the running times 
for the different recovery strategies. The running times for WOMP are referred to 
К = 25 iterations, sufficient to reach the best accuracy for every value of A as shown 
in Figs. 1 and 2. Moreover, the computational times for WOMP take into account 
the €?-normalization of the columns of A (see Remark 1). WOMP consistently 
outperforms convex minimization, being more than ten times faster in all cases. We 
note that in this comparison a key role is played by the parameter K or, equivalently, 
by the sparsity of the solution. Indeed, in this case, considering a larger value of 
К would result is a slower performance of WOMP, but it would not improve the 
accuracy of the WOMP solution (see Figs. 1 and 2). 
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5 Conclusions 


We have considered a greedy recovery strategy for high-dimensional function 
approximation from a small set of pointwise samples. In particular, we have 
proposed a generalization of the OMP algorithm to the setting of weighted 
sparsity (Algorithm 1). The corresponding greedy selection strategy is derived in 
Proposition 1. 

Numerical experiments show that WOMP is an effective strategy for high- 
dimensional approximation, able to reach the same accuracy level of WQCBP while 
being considerably faster when the target sparsity level is small enough. A key role 
is played by the regularization parameter A, which may be difficult to tune due to 
its sensitivity to the parameters of the problem (т, s, and d), and on the polynomial 
basis employed. In other applications, where explicit formulas for the weights as (4) 
are not available, there might also be a nontrivial interplay between A and ш. In 
summary, despite the promising nature of the numerical experiments illustrated in 
this paper, a more extensive numerical investigation is needed in order to study 
the sensitivity of WOMP with respect to A. Moreover, a theoretical analysis of the 
WOMP approach might highlight practical recipe for the choice of this parameter, 
similarly to [2]. This type of analysis may also help identifying the sparsity regime 
where WOMP outperforms weighted n minimization, which, in turn, could be 
formulated in terms of suitable assumptions on the regularity of f. These questions 
are beyond the scope of this paper and will be object of future work. 
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On the Convergence Rate A 
of Hermite-Fejér Interpolation m 


Shuhuang Xiang and Guo He 


1 Introduction 


For an arbitrarily given system of points 


{к UPS UMS в (1) 


Faber [3] in 1914 showed that there exists a continuous function f (x) in [—1, 1] for 
which the Lagrange interpolation sequence L;[f] (n = 1,2,...) is not uniformly 


convergent to f in [—1, 1], where co (x) = (x — xy NOM (х — xy 


(n) Po co _ Og (х) | > 
ЛО) = Уе xg M^ о), бү (хх) GP a (2) 


Whereas, based on the Chebyshev pointsystem 


2k—1 
sr -es( 5 3] k=1,2,....n, п=1,2,..., (3) 
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1075 с о ооо 5 I-L (у! «у о Mf()-L, (у! 
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10220 107 
10° 10! 102 10° 100 10! 10? 10° — 10? 10! 10? 103 
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. * — . 
Fig. 1 |Ho,—1Cf. x) — Лоо, Ри Cf. x) — Лоо and | H5, (р x) — f@)llooatx = —1: 


0.001 : 1 by using Chebyshev pointsystem (3) for f(x) = sin(x), f(x) = THES and f(x) — |х|, 
respectively 


Fejér [4] in 1916 proved that if f € C[—1, 1], then there is a unique polynomial 
Hog (Cf, x) of degree at most 2n — 1 such that іт, оо || Hos —1(f) — fllo = 0, 
where Hn—1 (f, x) is determined by 


Нар х м) = Г”), H, (f of?) =0, к=1,2,...,п. (4) 


This polynomial is known as the Hermite-Fejér interpolation polynomial. 

It is of particular notice that the above Hermite-Fejér interpolation polynomial 
converges much slower compared with the corresponding Lagrange interpolation 
polynomial at the Chebyshev pointsystem (3) (see Fig. 1). 

To get fast convergence, the following Hermite-Fejér interpolation of f(x) at 
nodes (1) is considered [6, 7]: 


Ho xe for M Ge S yo 0. (5) 
k=1 


k=) 


where ny (x) = v? (x) («c»). n" (x) = (х— x”) (e w) and v (x) ES 
oh ©” ) 
"TN 

Fejér [5] and Grünwald [7] also showed that the convergence of the Hermite- 
Fejér interpolation of f (x) also depends on the choice of the nodes. The pointsystem 
(1) is called normal if for all n 


1—(х =x) 


090) >20, k-L2,..m xe[-11], (6) 
while the pointsystem (1) is called strongly normal if for all n 
v(x) > с> 0, k=1,2,...,n, x€[-1,1] (7) 


for some positive constant c. 
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Fejér [5] (also see Szegó [12, pp 339]) showed that for the zeros of Jacobi 
polynomial pP (х) of degree n (a > —1, B > —1) 


v" (x) > шиц-а, —В} for —1 <a <0,-1<f x0, k 2 1,2,...,n and x e [-1, I]. 


For (strongly) normal pointsystems, Grünwald [7] showed that for every f Е 
C (-1, 1), limo HZ, 4C — fllo = 0 if (x1) is strongly normal satisfying 
(T) and { f (x(?)) satisfies 


LF «| < n^ for some given positive number ô, k=1,2,..., п=1,2,..., 


while іт, > со || H5, 


100) = fleo = Ош [-1+ €, 1 — e] for each fixed 0 < € < 1 


if {х{?} is normal and ( f (x1) is uniformly bounded for n = 1,2, .. ..! 


Moreover, Szabados [11] showed the convergence of the Hermite-Fejér interpo- 
lation (5) at the Chebyshev pointsystem (3) satisfies 


И — Baloo = ОХ — plein (8) 


where p* is the best approximation polynomial of f with degree at most 2n — 1 and 
llf — Ренн = maxosjzi llf? — р*? |. 

Hermite-Fejér interpolation has plenty of use in computer geometry aided 
geometric design with boundary conditions including derivative information. The 
convergence rate under the infinity norm has been extensively studied in [5— 
7, 11, 14]. The efficient algorithm on the fast implementation of Hermite-Fejér 
interpolation at zeros of Jacobi polynomial can be found in [17]. 

In this paper, the following convergence rates of Hermite-Fejér interpolation 
H5, V Cf. x) at Gauss-Jacobi pointsystems are considered. 


e If f is analytic in Ep with | f(z)| < M, then 


О 


At, M[2np2 + (1— =), "T 
ИА) — H5, Cf, X)\loo = 
al 


(о — 1)?ю?" шш 
n?*?Y [202 + (1 — 2п)р] 0 RT | 
(о — 0202 | 


(9) 


По fact, Grünwald in [7] considered more general cases with any vector (di) instead of 


(f(x Py. 
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— 


O (п 15—"11(9.й) logn), if — 1 < min(o, B) < y < —5 
Tr =} O(m minap -3), і — 1 < minfa, В} <- <y x0. (10) 
О(п??), if — 3 < minfa, В) < y 


* If f (x) has ап absolutely continuous (r — 1)st derivative f (r-D оп [—1, 1] for an 
integer r > 3, and arth derivative f™ of bounded variation V, = Var(f) < 
oo, then 


2 
1 (11) 
2 


ИЛ) — Ay, (хо = " (12”) 
Comparing these results with 


f(x) — Hon-1(f, x) = E. ‚ (Vértesi [14]), 


O(n”), ify > —4 


NI 


which is sharp and attainable (see Fig. 2), we see that НУ (f, x) converges much 


faster than H2n—1 (f, x) for analytic functions or functions of higher regularities (see 
Fig. 1). Particularly, H2,_1(f, x) diverges at Gauss-Jacobi pointsystems with у > 0, 


whereas, Н» (f, x) converges for functions analytic in the Bernstein ellipse or of 


finite limited regularity. 


f(x)-lxl, a=-0.5, 3=-0.5 f(x)-lxl, a=-0.9, 3=-0.8 fQ)=|x|, а=-0.3, 8=-0.6 f(x)-lxl, а=-0.2, 80.3 


109 10° 10 102 
—e— If), (f0 е IIf(x)-H,. (ху! BE ЫШ. ЖЕТИ 
10°°log(n)n™ 10 ?log(n)n’! 8 тео.) п2таха,д} 
M 
107 Bec. 107 Р 107 
10! 
10? 10? 10? d 
107 10? 10? 5 : 5 
10' 102 109 — 10! 10° 10° 10 10 10° — 10! 10? 103 
n=11:2:1000 n=11:2:1000 n=11:2:1000 n=11:2:1000 


Fig. 2 || Hon-1(f, x) — f(x) |loo at x = —1 : 0.001 : 1 by using Gauss-Jacobi pointsystem for 
f(x) = |x| with different o and В, respectively 
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For simplicity, in the following we abbreviate x as Xk, g" (x) as вк (x), je (x) 
as hi (x), and pi? (x) as Р; (x). A ~ B denotes there exist two positive constants с 
and c2 such that cı < |A|/|B| < с. 


2 Main Results 


Suppose f(x) satisfies a Dini-Lipschitz condition on [—1, 1], then it has the 
following absolutely and uniformly convergent Chebyshev series expansion 


оо 1 . 
foe emo, pst JE 1=0,1,.... (12) 


m J 2. 
7=0 1 1-х 


where the prime denotes summation whose first term is halved, 7;(х) = 
cos( j cos! x) denotes the Chebyshev polynomial of degree j. 


Lemma 1 


(i) (Bernstein [2]) If f is analytic with | f(z)) < M in the region bounded by 
the ellipse Ep with foci +1 and major and minor semiaxis lengths summing to 
p > 1, then for each j > 0, 


2M 
|< < px (13) 
p" 


(r — 1)st derivative f *—? on [—1, 1] and a rth derivative f of bounded 
variation V, = Var(f ?) < oo, then for each j >r +1, 


(ii) (Trefethen [13]) For an integer r > 1, if f(x) has an absolutely continuous 


2V, 


[< 6. 
nj(j—1)--G-—r) 


(14) 

Suppose —1 < Xn < хр < ++- < x; < 1 in decreasing order are the roots of 
PEP (х) (о, В > —1), and {wiy are the corresponding weights in the Gauss- 
Jacobi quadrature. | 


Lemma 2 For ј = 1, 2,..., п, it follows 
У@-хри) nIT (n tat В+ 1) 
— Aveo ТОТИ рев 
(x xj)£;j(x) =o 1) 2(9+8+1)/2 Гоа рова" (х), 


(15) 


where оп = +1 for even п and on = —1 for odd n. 
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Proof Let zn = f! 0 —3)* (14-x)8 LPP (x) dx and К» the leading coefficient 
of p^? (x). From Abramowitz and Stegun [1], we have 


291 Ги o4 DF 4 2410 _ 1 T@n+a+6+1) 
2п +а+ В +1 nT (n + a B4 1) 2" nl (n -- o 4 В+ 1) 


Zn = , n = 


Furthermore, by Szegó [12, (15.3.1)] (also see Wang et al. [15]), we obtain 


А | Кї2п(1— хш, 
(x — х); (х) = aU gym = ои(— y ОВ Р 


saen Ж. nig. 
Zn(2n+a+ B+ 1) Pn 
which implies the desired result (15). О 
Lemma З For ј = 1, 2,..., п, it follows 
(хи; = о (n7). (16) 


а „ү 20+1 . х 28-1 
Proof From ш; = О (= (sin 1) : (cos 1) ) Szegó [12, (15.3.10)], 


n 


2043 284-3 
+В+3 . 0 
we see for x; = cos 0; that (1—х7)шј =O c (sin 1) (cos 4) ) 


which derives the desired result. a 


Lemma 4 ([10, 16]) For t € [—1,1], let xm be the root of the Jacobi polynomial 
PoP) which is closest to t. Then fork = 1,2,...,n, we have 


e (Ik — mr? + Ik — тр) ‚кт 
O(1) k=m 


x(t) = ‚ y = шаж{а, В}. (17) 


Lemma 5 (Szegó [12, Theorem 8.1.2]) Let o, В be real but not necessarily greater 
than —1 and xy = cos бу. Then for each fixed k, it follows 
lim иб; = jk, (18) 
n—oo 
where jx is the kth positive zero of Bessel function Jy. 
Lemma 6 Fork = 1,2,...,n, it follows 


cy (Xk) 
c, (XK) 


vk (x) = 1 — (x — xy) = O(n’). (19) 
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Proof Note that pine (х) satisfies the second order linear homogeneous Sturm- 
Liouville differential equation [12, (4.2.1)] 


(1 — х2)у" + (B-a — («+ B + 2)х)у + n(n a +В + Dy =0. 
(o, B) 
By ол(х) = fy) we get 


хр _ _В-а-@+В+2)х, 
2 
j 


"I" 15 (112, (14.5.1)]). (20) 


In addition, by Lemma 5 with x; = cos0;, we see that 0; ~ 1. Similarly, Бу 
PEP (x) = (—1)” PLP (x) we have 0, ~ +. These together yield 


n 


1-х 1 


1 2 2 1 1 1 2 
5 = О(п ), Lc ) 5 < max Үт? = O(n^) 
I n j 1 п 


and then by (20) it deduces the desired result. o 


Theorem 1 Suppose Uia are the roots of pP (х) with a, В > —1, then the 
Hermite-Fejér interpolation (5) for f analytic in Ep with | f(z)| < M at Ulis 
has the convergence rate (9). | 


Proof Since the Chebyshev series expansion of f (x) is uniformly convergent under 
the assumptions, and the error of Hermite-Fejér interpolation (5) on Chebyshev 
polynomials satisfies |Е(Т;,х)| = |Tj(x) — Н» 4(Tj, x) = 0forj = 
0, 1, ..., 2n — 1, then it yields 


EGD = 1f 60 — HEA Ol S IY ET OI < Y? |су|Е(Ту, x). 
j-0 j=2n 


(21) 


Furthermore, |E(T;, x)| = |Tj(x) — У То) ^ py T; (x;)b;(x)|. In the 
following, we will focus on estimates of | Е(Т;, x)| for j > 2n. 

In the case у < 0: Notice that the pointsystem is normal which implies h; (x) > 0 
for alli = 1,2,...,n and for all x € [—1, 1], 


12 Y hi) = У). 
i=l 


ї=1 


Then we have 


IS Tj E YMO =1, 1=0,1,.... Q2) 
i=l i=l 
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Additionally, by Lemma 2, it obtains for j = 2n, 2n + 1,... that 


У Tj КӘ GO] 
= jl Ya Uja EDA - x2 
j IT 1 Д 
= нуту re oo) Ya Uja — xDwitjGOl 


= буту ТРИ О) DIL sin(G — 1) агссоз(ху)) Urt; (х)| 


10 (ini Pe) J [ND 


(Ш}—1 is the second kind of Chebyshev polynomial of degree j — 1) since 


п!Г(и-а- В- Г) : А , 
\/ Гана (-ЕВ+Т) 19 uniformly bounded in n fora, В > —1 due to 


кф Е ap ) 
Гп-+а-+ 2)Га+в8+2) — (n 4- D? + (a+ B)(n + 1) + ав 


" Al (n + a 4- B 4-1) 
Ги + o 4 DPG-c 8-1) 


which implies ree is uniformly bounded in n and then 


T D o . 
А т is uniformly bounded. Here Л, = maxyep 1,1] Уу 16: (x)| is 


the Lebesgue constant. Then from 
1 3 
pP (x) = O (n 2), if max{a, В} x — 
3 О (пах, BY). if тах{о, В} > — 


‚= О (n 229.8} if minfa, В} < — 
“| Ошо, if min(o, В} > — 


ыы 


(see Szegó [12, pp 168, 354]) and 


1 
т (12, рр 338]), 


О (log n), if max{a, В} < — 
Ал = а т. 
О (пах), if max(o, В} > — 


we have 


I Т; (| = jt. Q3) 


ї=1 
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Then by (22) and (23), we find | E(Tj, x)| < 2+ jt < 2јт, for j > 2n, and 
consequently 


ЕСА х) = 11%) — H5, 40.200 € Y | ЕСТУ, х) = 27 У Ле, 
j=2n j=2n 


which, directly following [18], leads to the desired result. 
In the case у > 0: From |E(Tj,x)) = |Tj(x) — Ya Tanh) — 
pam T; (xi)bi (x)|, by Lemmas 3 and 6 we obtain 


2s WOIE = О G f eu) = O(n2*?7), 
i=l 1 


and 


туо) — Тушн (х) = TjG) — УСТ д = О (t). 


i=1 i=l 


These together with 


| 2 4 ТУРБО 
j ! 1 А КИТ 
= geom V ОНР РУ" (x) Уу sin((j — 1) arecosQr)) те (x)| 
= jūn 
and then |Е(Т;,х)| = О (j 2+2у ) for j > 2n, similar to the above proof in the case 


of y < 0, implies the desired result. a 


From the definition of ти, we see that when е = В = -j the convergence order 
on n is the lowest. In addition, if f is of limited regularity, we have 


Lemma 7 (Vértesi [14]) Suppose Ua are the roots of PEP (х), for every 
continuous function f (x) we have 


n " Es 2 2 z 
lEn- O= 00) 7 16 (да 3 pra, 
n n 


j=l 
(24) 


where w(f;t) = w(t) is the modulus of continuity of f(x) and у = 
max (o. В, —1 2 
Theorem 2 Suppose bra are the roots of pi) (x) (a, B > —1), and f (x) has 


an absolutely continuous (r — 1)st derivative go on [—1,1] for some r > 3, 
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and a rth derivative f? of bounded variation V, < oo, then the Hermite-Fejér 
interpolation (5) at ba has the convergence rate (11). 


Proof Consider the special functional L(g) = Е, (g, x), where E,(g, x) is defined 
for Vg Е C! ([—1, 1]) by 


En(g,x) = g(x) – Уорд) - У gap -x)6G). 5) 


j=l j=l 


By the Peano kernel theorem for n > г (see Peano [9] or Kowalewski [8]), En (f, x) 
can be represented as 


1 
Ех) = [ fO WK, (да (26) 
—1 


with К, (f) = cL (с E art forr = 3, 4,..., that is 


n 


1 1 н 
K,(t) = =й” – 0) — GDI Le- OF иде) 


(r = 
1 n u 
Б тетт. »2 = кеу б), 
J= 


where 


сз (х= 1) 1, х2 г; 1, х> є; 
и = [9 Кек (К > 2), wh = | (К = 1). 


Moreover, noting that 


1 2 _ : 1 1-3 _ 
219 + -f haat 0+ (а, k=3,4,---, 


we get the following identity 
1 
Ks—1(u) = f K; »(t)dt, 5 = 4, 5, тит 
и 


where Кә (1) is defined by 


Ko(t) = x — D4 — 3G; 00,0) – Уу - 05 — x60). 
1=1 


jai 
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In addition, it can be easily verified that K,(—1) = K,(1) = 0 for s = 2, 3,.... 
Since f” is of bounded variation, directly applying the similar skills of Theorem 
2 and Lemma 4 in [16], we get 


lEn Cf. х) < Vell Kr-+1 оо, Q7) 


and 


К; < sup [K;(f), fors =2,3,---, (28) 


2n — $ Aeris] 
respectively. Then from (27) and (28), we can obtain that 


z' y, 
lEn СУ, х) < On -30n 3)... И. (29) 


In addition, by Lemma 7, we have 


n o (=), y<-i 
lx — 01 — 36; - 0nlvjo)5 @)llo = Á (30) 
2 P7 | об), ve 
while by Lemmas 2—3, we get 
я п О logn у < _1 
17 = 7 
DDOE -хрб OEDDEN ( ч ) н 
= j=l O (n J; у> —5. 
(31) 


logn 1 
о ( n )rz-i 


= o (e), y »-l. 


Finally, We use a function of analytic f(x) = TOSS and a function of limited 


regularity f (x) = |х| to show that the convergence rate of || f (x) — Hx, 1; xX) leo 
is dependent оп o and f in Fig. 3. 
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105 {(х)=1/(1+25х°) 105 f(x)=Ixl® 
—6— a--0.7, 8=-0.8 —6— az-0.9, @=-0.8 
—6— а=-0.3, 8=-0.4 —$— a--0.2, 8=-0.1 
100% —8— az 0.3, = 0.4 m —=— az 0.8, = 0.6 
105 105 
10710 10710 
10-15 | 5 ш 105 | | 
109 10! 10° 10° 10° 10! 10? 10? 
п=1:10:1000 п=1:10:1000 


Fig.3 |H5, (f.x) — Ло) at x = —1 : 0.001 : 1 by using Gauss-Jacobi pointsystem for 
Р(х) = Iss and f(x) = |x|° with different a and B, respectively 
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Fifth-Order Finite-Volume WENO м 
on Cylindrical Grids sd 


Mohammad Afzal Shadab, Xing Ji, and Kun Xu 


1 Introduction 


The conventional WENO scheme is specifically designed for the reconstruction in 
Cartesian coordinates on uniform grids [1]. The employment of Cartesian-based 
reconstruction scheme on a cylindrical grid suffers from a number of drawbacks 
[2, 3], e.g., in the original PPM paper, reconstruction was performed in volume 
coordinates (than the linear ones) so that algorithm for a Cartesian mesh can be used 
on a curvilinear mesh. However, the resulting interface states became first-order 
accurate even for smooth flows [2]. Another example can be the volume average 
assignment to the geometrical cell center of finite-volume than the centroid [2]. A 
breakthrough in the field of high order reconstruction in cylindrical coordinates is 
the application of the Vandermonde-like linear systems of equations with spatially 
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varying coefficients [2]. It is reintroduced in the present work to build a basis for the 
derivation of the high order WENO schemes. 

The motivation for the present work is to develop a fifth-order finite-volume 
WENO reconstruction scheme in the efficient dimension-by-dimension framework, 
specifically aimed at regularly-spaced and irregularly-spaced grids in cylindrical 
coordinates. 


2 Finite-Volume Discretization in Curvilinear Coordinates 


2.1 Evaluation of the Linear Weights 


A non-uniform grid spacing with zone width Ag = & 1 — & _ 1 is considered 


i+ 
having € € (х, х2, хз) as the coordinate along the řeconstrüction direction and 
&, +1 denoting the location of the cell interface between zones i and i + 1. Let О; 
be the cell average of conserved quantity Q inside zone i at some given time, which 


can be expressed in form of Eq. (1). 
! т ооз dg & AW [g^ а) 
Qi = AT дё 


where the local cell volume A7; of ith cell in the direction of reconstruction given 
in Eq. (1) and oe is the one-dimensional Jacobian. Now, our aim is to find a pth 
order accurate approximation to the actual solution by constructing a (p — 1)th 
order polynomial distribution, as given in Eq. (2). 


0:6) = aio ал — 87) + aia — EP) +... +41 p-1€-8)?" 0) 


where а; „ corresponds to a vector of the coefficients which needs to be determined 
and ё can be taken as the cell centroid. However, the final values at the interface are 
independent of the particular choice of £f and one may as well set £? = 0 [2]. Unlike 
the cell center, the centroid is not equidistant from the cell interfaces in the case 
of cylindrical-radial coordinates, and the cell averaged values are assigned at the 
centroid [2]. Further, the method has to be locally conservative, i.e., the polynomial 
О; (&) must fit the neighboring cell averages, satisfying Eq. (3). 


v oikat = Avr Oi for —i, <s <ів (3) 
1 


its—5 
where the stencil includes iz cells to the left and ig cells to the right of the ith zone 
such that iz + ig + 1 = p. Implementing Eqs. (1)-(2) in Eq. (3) along with a simple 
mathematical manipulation leads to Eq. (4), which is the fundamental equation for 
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reconstruction in cylindrical coordinates. For the detailed derivation, kindly refer to 


[3]. 


T 


ВиО «++ Bi-iz,p-1 Wi i 1 
bo 78 : : = (4) 
$ [4 —1 
Dixig,0 - -- Bi+ir,p—1 Win Gist ==)" 
where ‘+’ represents the positive and negative weights i.e. weights for 


reconstructing right (+) and left (—) interface values respectively. Also, the grid 
dependent linear weights (w;-,) satisfy the normalization condition [2]. 


2.2 Optimal Weights 


For the case of fifth-order WENO interpolation, the third order interpolated 
variables are optimally weighed in order to achieve fifth-order accurate interpolated 
values as given in Eq. (5) for the case of p — 3 [1]. 


(2р—1)+ __ + pc 
4:0 = Cilli (5) 


where C;; is the optimal weight for the positive/negative cases on the ith finite- 
volume. So, Eq. (4) is used again to evaluate the weights for the fifth-order (2p 1 = 
5) interpolation (i, = 2, ig = 2). 

Linear and optimal weights are independent of the mesh size for standard 
regularly-spaced grid cases. They can be evaluated and stored (at a nominal cost) 
independently before the actual computation. Also, they conform to the original 
WENO-JS [1] for the limiting case (R — со). The weights required for source 
term and flux integration in one or more dimensions are given in [3]. 


2.3 Smoothness Indicators and the Nonlinear Weights 


The mathematical definition of the smoothness indicator is given in Eq. (6) [1]. 


p-1 


I$ = [^ HE o.) Agp" ldg, peu pe (6) 


т=1 


To evaluate the value of 15; 1, а third order polynomial interpolation on ith cell is 
required using positive and negative reconstructed values by stencil 51, as given in 
Eq. (2). Finally, evaluating the values of the coefficient a's and substituting their 


640 M. А. Shadab et al. 


values in smoothness indicator formula (6) yields the grid-independent fundamental 
relation (7). The nonlinear weight (c;) for the WENO-C interpolation is defined in 


Eq. (8) [1], where є is chosen to be 10-6 [1, 3]. 


151 = 4890? – 390д:(а; + qj) + 10004)? + (a5) 199,547) 0) 


EN 20 Сы 


Е 1 
or, = = & = Е 
а о 91 (€ + 15:1) 


The final interpolated interface values are evaluated from Eq. (9). 


1=0,1,2 (8) 


=н 
тең 
s I 


j= 
(2р-)+ pt pc 
di = =) © fil (9) 


3 Stability Analysis of WENO-C for Hyperbolic 
Conservation Laws 


For WENO-C to be practically useful, it is crucial that it enables a stable discretiza- 
tion for hyperbolic conservation laws when coupled with a proper time-integration 
scheme. In this section, we analyze WENO-C scheme for model problems involving 
smooth flow in 1D cylindrical-radial coordinates, based on a modified von Neumann 
stability analysis [4]. We consider scalar advection equation (10) in 1D cylindrical- 
radial coordinates. 


СРЕ. = ((2) 2) =0 ЕЕ [0, оо], #>0 (10) 
Or | (7/98) OENNOE) ~") 7 О 


where О is the conserved variable, (9 7/98) = & is the one-dimensional Jacobian іп 
cylindrical-radial coordinates. Boundary conditions are not considered in the present 
approach to reduce the complexity of the analysis. Assuming a uniform grid with 
& = i A£ and ёф — & = ДЕУ and (i = 1/2) denotes the boundaries of the finite- 
volume i. In the finite-volume framework, Eq. (10) transforms into the conservative 
scheme given in Eq. (11). 


эб 
ot 


l 2 A 
= —— (F; — Fj- 11 
АЎ“ i+1/2 — Fi-1/2) (11) 


where numerical flux "n 1/2 is the Lax-Friedrich flux, and Qi and ¥ are given in 
Eq. (1). For this particular problem, let v — 1 in Eq. (10). Therefore, only the values 
on the left side of the interface are considered. Based on the von Neumann stability 
analysis, the semi-discrete solution can be expressed as a discrete Fourier series. By 
the superposition principle, only one term in the series can be used for analysis, as 
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illustrated in Eq. (12). 
Qi(t) = Ox(t)e/™*, where j 2-1 (12) 


By substituting Eq. (12) in Eq. (11), we can separate the spatial operator L, as 
given in Eq. (13). 


Gp = Е 1/2) = [Q(37/8&)]; 15, = [Q(3778&)]; 45 _ (б) Qi 
Е АЎ m AY — AE 
(13) 


L= 


where the complex function z (6%) is the Fourier symbol. By substituting the values 
of Q; | n and О; n using fifth-order positive weights of cells (1 — 1) and i 
respectively for a smooth solution, the value of 2 (0;) for WENO-C can be evaluated 
using Eq. (14). 


+2 
т- 1 Lom jl0; dp . j(1—1)0 
200) = rock — тун » ie’ © = wg gu 6 — De 7 
1=—2 
(14) 


where т = 1 for cylindrical-radial coordinates. Using the same approach as given in 
[4], we can plot the spatial spectrum (S : —z(04) for 6; є [0,2л]} and the stability 
domain 5, for TVD-RK order 3. The maximum stable CFL number of this scheme 
can be computed by finding the largest rescaling parameter 6, so that the rescaled 
spectrum still lies in the stability domain. 

It can be observed from Fig. 1 that the spatial spectrums S of WENO-C differs 
initially with the index numbers ; due to the geometrical variation of the finite- 
volume. However, the spectrums are the same for high index numbers (i), similar 
to WENO-JS, as the fifth-order interpolation weights converge. Some regions 
(i — 1,2) require boundary conditions and thus, are not considered in the present 
analysis. The values of CFL number for cylindrical-radial coordinates lie in between 
1.45 and 1.52. As a final remark, it can be concluded that the proposed scheme is 
A-stable with third or higher order of RK method with an appropriate value of CFL 
number for this case. 


4 Numerical Tests 


In this section, several tests on Euler equations are performed to analyze the 
performance of the WENO-C reconstruction scheme. Tests are performed on a 
gamma law gas (y — 1.4) in cylindrical coordinates to investigate the essentially 
non-oscillatory property of WENO-C for discontinuous flows and the convex 
combination property for smooth flows. For first-order and second-order (MUSCL) 
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spatial reconstructions, Euler time marching and Maccormack (predictor-corrector) 
schemes are respectively employed. For WENO-C, time marching is done with 
TVD-RK order 3 for 1D cases and RK order 5 for the 2D case. 


4.1 Acoustic Wave Propagation 


A smooth problem involving a nonlinear system of 1D gas dynamical equations is 
solved to test fifth-order accuracy of the spatial discretization scheme [3]. The Euler 
equations in cylindrical-radial coordinates can be written in the form of Eq. (15). 


5 [2 9 риК 0 
5; |^" | + gag | № + PR] = | p/R (15) 
E (Е + p)uR 0 


where p is the mass density, и is the radial velocity, р is ће pressure, and Е is ће 
total energy. Equation (16) serves as the adiabatic equation of state. 


Р l c4 
E = — +- 16 
za a" (16) 


The initial conditions are provided in Eq. (17) with the perturbation given in 
Eq. (18). The interface flux is evaluated with Rusanov scheme [3]. 


p(R,0)=1+ef(R), u(R,0)=0, p(R,0)=1/y + ef(R) (17) 
зіп&(5л К) : 
——gQY—- 10.4< А < 0.6 
F(R) = К THEE (18) 
0 otherwise 
A sufficiently small perturbation with = = 1074 yields a smooth solution. The 


interface flux is evaluated using Rusanov scheme with a CFL number of 0.3. 

The initial perturbation splits into two acoustic waves traveling in opposite 
directions. The final time (t = 0.3) is set such that the waves remain in the 
domain and the problem is free from the boundary effects. The computational 
domain of unity length is uniformly divided into N different zones i.e. N = 
16, 32, 64, 128, 256. Although an exact solution known up to Ole?) is known, the 
solution on the finest mesh N = 1024 is taken as the reference. Figure 2 illustrate 
the spatial variation of density at ғ = 0.3 inside the domain. From Table 1, it clear 
that the scheme approaches the desired fifth-order accuracy. 
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Fig. 2 Spatial profiles of density at t — 0.3 for acoustic wave propagation test in cylindrical-radial 
coordinates 


Table 1 L| norm errors and "m E T 7s 
order of convergence table for EDEN | - 1 


acoustic wave propagation —— = 
test 32 |491E-06 | 1.036. 
_ 64 |6.748—07 |2.865 
128 |324E-08 | 4.380 


256 |127E-09 | 4.670. 


4.2 Sedov Explosion Test 


Sedov explosion test is performed to investigate code's ability to deal with strong 
shocks and non-planar symmetry [3]. The problem involves a self-similar evolution 
of a cylindrical blastwave in a uniform grid (№ = 100) from a localized 
initial pressure perturbation (delta-function) in an otherwise homogeneous medium. 
Governing equations are given in Eq. (15) and the fluxes are evaluated with Rusanov 
scheme and GKS [5]. For the code initialization, dimensionless energy e — 1 
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is deposited into a small region of radius 6 = 3AR. Inside this region, the 
dimensionless pressure P, is given by Eq. (19). 


pe 3(y — le 


9 (m-F2)róm*b (19) 


where т = 1 for cylindrical geometry. Reflecting boundary condition is employed 
at the center (R — 0), whereas boundary condition at R — 1 is not required for this 
problem. The initial velocity and density inside the domain are 0 and 1 respectively 
and the initial pressure everywhere except the kernel is 1075. As the source term is 
very stiff, the CFL number is set to be 0.1. The final time is t = 0.05. 

Figure 3 shows that the peak for WENO-C is higher for density and is closest to 
the analytical value, similar to fifth-order finite difference version [3], but MUSCL 
has higher offset peaks for pressure and velocity. GKS performs slightly better than 
RS, as the peaks are slightly higher for all the cases. 
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Fig.3 Variation of density, velocity, and pressure with the radius for Sedov explosion test in 
cylindrical-radial coordinates. Domain is restricted to R — 0.4 for the sake of clarity 
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4.3 Modified 2D Riemann Problem in (R — z) Coordinates 


The final test for the present scheme involves a modified 2D Riemann problem 
in cylindrical (R — z) coordinates, as illustrated in Fig.4 (top left). The problem 
involves 2 contact discontinuities and 2 shocks as the initial condition, resulting 
in the formation of a self-similar structure propagating towards the low density- 
low pressure region (region 3). The governing equations in cylindrical (R — z) 
coordinates are provided in Eq. (20). 

The computations are performed until £ = 0.2 with a CFL number of 0.5 on a 
domain (R, z)=[0,1] x [0,1] divided into 500x 500 zones. The boundary conditions 
are symmetry at the center (except for the antisymmetric radial velocity) and 
outflow elsewhere. HLL Riemann solver is used for flux evaluations. Rich small- 
scale structures in the contact-contact region (region 1) can be observed from 
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Fig. 4 Modified 2D Riemann problem in cylindrical (r — z) coordinates: schematic (top left), 
density contours at 1 = 0.2 with first-order (top right), second-order MUSCL (bottom left), and 
WENO-C (bottom right) reconstruction schemes 
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Fig.4 for WENO-C reconstruction, when compared with first and second-order 
MUSCL reconstruction. Structures are highly smeared for the case of first-order 
reconstruction. 
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5 Conclusions 


The fifth-order finite-volume WENO-C reconstruction scheme is proposed for 
structured grids in cylindrical coordinates to achieve high order spatial accuracy 
along with ENO transition. A grid independent smoothness indicator is derived 
for this scheme. For uniform grids, the analytical values in cylindrical-radial 
coordinates for the limiting case (R — со) conform to WENO-JS. Linear stability 
analysis of the present scheme is performed using a scalar advection equation 
in radial coordinates. Several tests involving smooth and discontinuous flows are 
performed, which testify for the fifth-order accuracy and ENO property of the 
scheme. 
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